SOLR Field not reflected in schema browser - solr

I created a solr core using bin/solr -c core1 and then copied the schema.xml file from basic config set to core1/conf folder and added a field
<field name="title" type="text" indexed="true" stored="true"/>.
But this field is not reflected in schema browser.
What configurations should I make to get the new fields reflected in schema browser in solr admin ui?
I am using solr 5.3.1

By default when you create a solr core it will use managed schema. You will see the following configuration in solrconfig.xml after core is created.
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
Above this configuration you will find the comments on how use managed-schema. Comment this out and uncomment the following to use schema.xml
<schemaFactory class="ClassicIndexSchemaFactory"/>

You need to reload the core: go to http://yourhost:8983/solr/#/~cores/core1 and press "Reload" button.

Related

Solr Cloud in Kubernetes Indexing error - HttpSolrCall Unable to write response

I am trying to do an Indexing with Solr Cloud running in Kubernetes cluster. I defined a Data Import Handler and I can see the configuration in Solr UI.
The Data Import Handler will allow me to trigger a SQL query and fetch the Polygon data for building the index.
<dataSource
type="JdbcDataSource" processor="XPathEntityProcessor"
driver="oracle.jdbc.driver.OracleDriver" ...... />
<document>
<entity name="pcode" pk="PC" transformer="ClobTransformer"
query="select PCA as PC, GEOM as WPOLYGON,
SBJ,PD,CD
from SCPCA
where SBC is not null>
<field column="PC" name="pCode" />
<field column="WPOLYGON" name="wpolygon" clob="true"/>
<field column="SBJ" name="sbjcode" clob="true"/>
<field column="PD" name="portid"/>
<field column="CD" name="cancid"/>
</entity>
</document>
</dataConfig>
After triggering the index via UI.It runs for around 1 minute and fails with following errors in the console
qtp1046545660-14) [c:sba s:shard1 r:core_node6 x:sba_shard1_replica_n4] o.a.s.u.p.LogUpdateProcessorFactory [sba_shard1_replica_n4] webapp=/solr path=/dataimport params={core=sba&debug=true&optimize=false&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=164589234356779&verbose=true}{deleteByQuery=*:*,commit=} 0 70343
2022-02-26 16:30:38.092 INFO (qtp10465423460-14) [c:sba s:shard1 r:core_node6 x:sba_shard1_replica_n4] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down => org.eclipse.jetty.io.EofException: Reset cancel_stream_error
at org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory$HTTPServerSessionListener.onReset(HTTP2ServerConnectionFactory.java:159)
org.eclipse.jetty.io.EofException: Reset cancel_stream_error
I am using Solr Cloud 8.9 with Solr operator 0.5.0 and I checked jetty config and it have an idle timeout of 120000.
Any one faced similar issues and fixed it?
Jetty's EofException almost always means one specific thing. The client
closed the connection before Solr could respond, so when Solr finally
finished processing and tried to have Jetty send the response, there was
nowhere to send it -- the connection was gone.
In my case I was doing a full data import to Solr and it failed with this HttpSolrCall Unable to write response EofException . This was happening due to issues with my managedSchema / schema.xml . I forgot to add all columns correctly in the schema.xml which caused the Indexing to fail with EofException. After correcting my schema.xml it worked fine.
It is bit confusing error as why there is an EofException for wrong schema. However, if it is Solr always check the schema.xml / managedSchema for any discrepancies.

Integrating grobid with tika and solr

I'm using Solr to index journal articles. Using the out-of-the-box configuration, it indexed the text of the documents, but I'm looking to use Grobid to pull out the authors, title, affiliations, etc. I got grobid up and running as a service.
I added
<str name="tika.config">/path/to/tika-config.xml</str>
to the requestHandler for /update/extract in solrconfig.xml
The tika-config looks like:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.journal.JournalParser">
<mime>application/pdf</mime>
</parser>
</parsers>
</properties>
I'm getting a ClassNotFound exception when I try to import a document, but can't figure out where to set the classpath to fix it.
As mentioned on the Solr user's list, the latest version of Solr (6.0.0) is using a version of Tika (1.7) that predates the addition of grobid (which came in in Tika 1.11) permalink. To follow the upgrade to Tika 1.13, see SOLR-8981

Errors while trying to configure Solr 5.3.1 on Windows 10

I'm trying to setup a very basic configuration of Solr, to read some text from a mysql table and index it. I'm following the steps in DIH Quick Start document.
The document doesn't tell you where to place solrconfig.xml.
At first I tried placing it under the solr5.3.1 folder (next to bin). That failed. Then I noticed the "add core" button was looking for it in server\solr\new_core. So I put it there, but then got this other error:
My data import handler looks like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
And here's data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/ctcrets"
user="root"
password="xxxx"/>
<document>
<entity name="id"
query="select RETS_STAGE1_QUEUE_ID as id, LN_LIST_NUMBER as name, xmlText as desc from RETS_STAGE1_QUEUE">
</entity>
</document>
</dataConfig>
What could be the problem?
The document assumes you already know the solr.home [1] directory structure. On top of that, I think it assumes you started the sample Solr instance (e.g. ./solr start -p 8984) where everything should be already set.
Once started you can see on the dashboard where the configuration is exactly located. Go there, change the files as suggested and RELOAD the core through the admin console (CoreAdmin). If you want you can also do a stop / restart.
As side notes:
the DIH is not part of the Solr core, so you should put some "lib" directive within the solrconfig.xml, as far as I remember, the sample config already has those directives so you don't need to "import" the DIH lib
the JDBC driver that allows the connection with the database is not included so your classpath (i.e. JVM or Solr classpath - through the same lib directive) must include this additional lib(s).
[1] http://www.solrtutorial.com/configuring-solr.html

Can not use ICUTokenizerFactory in Solr

I am trying to use ICUTokenizerFactory in Solr schema. This is how I have defined field and fieldType.
<fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory"/>
</analyzer>
</fieldType>
<field name="fld_icu" type="text_icu" indexed="true" stored="true"/>
And, when I start Solr, I am get this error
Plugin init failure for [schema.xml] fieldType "text_icu": Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.ICUTokenizerFactory'
I have searched in for that with no success. I don't know if I am missing something or there is some problem in schema.
If someone has tried ICUTokenizerFactory then please suggest what could be the problem.
Add this at the top of your solrconfig.xml:
<config>
<lib dir="${user.dir}/../contrib/analysis-extras/lucene-libs/" />
<lib dir="${user.dir}/../contrib/analysis-extras/lib/" />
This assumes that you are running from example directory with solr.solr.home set to your instance. Otherwise, just use absolute path to your Solr installation.
You can also copy all those jars into lib directory (under your core, not solr home). But the above is an easier way.
From the Wiki:
Lucene provides support for segmenting these languages into syllables with solr.ICUTokenizerFactory in the analysis-extras contrib module. To use this tokenizer, see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib

Configure DataImportHandler in SolrCloud with ZooKeeper

I have a SolrCloud configured like this: exploration of SolrCloud, the difference is that I use Solr 4.0.0 Beta. Shortly the configuration:
ZooKeeper on default port 2181
3 instances of Solr running on different ports
This is just for testing purpose. The desired configuration is with 3 ZooKeeper instances (one for every Solr instance). I manage to index some XML files with curl command.
Questions:
How can I configure DIH/collection? I managed to change the solrconfig.xml (config for dataimport-handler), add in lib the proper driver for DB connection, but in solr admin I get "sorry, no dataimport-handler defined!" The changes can be watched in zookeeper (I see the data_config.xml) and in solr admin panel I can see the updated version of solrconfig.xml.
Any good tutorial for a production deploy of solrcloud (with somthink like the desired configuration mentioned before) on single or multiple machine for Ubuntu 12.04 LTS?
Any advice would be appreciated! Thanks in advance!
Normally DIH config has nothing to do with wether you're using a single Solr instance or multiple instances in a solrCloud config. DIH will write data in the current instance's Lucene index, and then it's up to zooKeeper to speread it around on the other instances.
Make sure your DIH is propertly configured:
In solrconfig.xml, all necessary libraries are loaded. This means the two DIH jars:
<lib dir="../../../dist/" regex="solr-dataimporthandler-4.3.0.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-extras-4.3.0.jar" />
as well as others jars you may need (like Database JDBC driver, etc).
Still in solrconfig.xml make sure the DIH handler is declared, something like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Finally, the config file you declared in the DIH handler (data-config.xml) should be in the same "conf" dir as solrconfig.xml and should have proper content, something like:
<dataConfig>
<dataSource type="JdbcDataSource" name="myDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:#someHost:1521:someDb" user="someUser" password="somePassword" batchSize="5000"/>
<document name="myDoc" >
<entity name="myDoc" dataSource="myDatasource" transformer="my.custom.Transformer" query="select col1, col2, col3 from table1 where whatever" />
</document>
</dataConfig>

Resources