Solr, clustering (carrot) and NoClassDefFoundError - solr

i'm running Solr 3.4 and would like to use the clusteringComponent.
Following this tutorial: http://wiki.apache.org/solr/ClusteringComponent in combination with default entries at the solrconfig.xml i have the following configuration #solrconfig.xml
<searchComponent name="clustering"
enable="${solr.clustering.enabled:true}"
class="org.apache.solr.handler.clustering.ClusteringComponent" >
<!-- Declare an engine -->
<lst name="engine">
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/cl" class="solr.SearchHandler" >
<lst name="defaults">
<str name="echoParams">explicit</str>
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<!-- Fields to cluster on -->
<str name="carrot.title">UEBSCHRIFT</str>
<str name="carrot.snippet">TEXT</str>
</lst>
So if i try to use the requestHandler http://server:8080/solr/mycore/cl?q=*:* i get the following Java exception:
java.lang.NoClassDefFoundError: com.carrotsearch.hppc.ObjectContainer
at java.lang.J9VMInternals.verifyImpl(Native Method)
at java.lang.J9VMInternals.verify(J9VMInternals.java:72)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:134)
at org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.<init>(BasicPreprocessingPipeline.java:106)
at org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline.<init>(CompletePreprocessingPipeline.java:32)
at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.<init>(LingoClusteringAlgorithm.java:129)
at java.lang.J9VMInternals.newInstanceImpl(Native Method)
at java.lang.Class.newInstance(Class.java:1325)
at org.carrot2.util.pool.SoftUnboundedPool.borrowObject(SoftUnboundedPool.java:80)
at org.carrot2.core.PoolingProcessingComponentManager.prepare(PoolingProcessingComponentManager.java:128)
at org.carrot2.core.Controller.process(Controller.java:333)
at org.carrot2.core.Controller.process(Controller.java:240)
at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:136)
at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:735)
Caused by: java.lang.ClassNotFoundException: com.carrotsearch.hppc.ObjectContainer
at java.lang.Throwable.<init>(Throwable.java:80)
at java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:76)
at java.net.URLClassLoader.findClass(URLClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:923)
at java.lang.ClassLoader.loadClass(ClassLoader.java:609)
... 31 more
The point is, that I have no idea what this mean. I'm searching for hours without finding an solution.
By the way: I'm running on tomcat with the following options:
export CATALINA_OPTS="-Dsolr.clustering.enabled=true"
(is this still required in Solr 3.4?)
The catalina option is part of the java command, as you can see with ps -efa
/usr/lib64/jvm/java-1_6_0-ibm-1.6.0/jre//bin/java
-Djava.util.logging.config.file=/opt/tomcat6/conf/logging.properties -Xms2048m -Xmx2048m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dsolr.clustering.enabled=true -Djava.endorsed.dirs=/opt/tomcat6/endorsed -classpath /opt/tomcat6/bin/bootstrap.jar -Dcatalina.base=/opt/tomcat6
-Dcatalina.home=/opt/tomcat6 -Djava.io.tmpdir=/opt/tomcat6/temp org.apache.catalina.startup.Bootstrap start
Does anyone has an idea what i could do to solve this problem?
//Update:
if i add hppc-0.3.4-jdk15.jar, i get the following error:
java.lang.NoClassDefFoundError: org.apache.mahout.math.matrix.DoubleMatrix2D
at java.lang.J9VMInternals.verifyImpl(Native Method)
at java.lang.J9VMInternals.verify(J9VMInternals.java:72)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:134)
at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.<init>(LingoClusteringAlgorithm.java:134)
[...]
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.matrix.DoubleMatrix2D
at java.lang.Throwable.<init>(Throwable.java:80)
at java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:76)
at java.net.URLClassLoader.findClass(URLClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:923)
at java.lang.ClassLoader.loadClass(ClassLoader.java:609)
... 29 more
it looks like, i have to install an mahout archive, but i think, all packages for clustering are included in Solr 3.4?! It looks like, i'm on the wrong way?!

If you are using Solr with tomcat as a seperate instance, you would need to copy the jars so that they are available for Solr.
Quote from README.txt
NOTE: This Solr example server references certain Solr jars outside of
this server directory for non-core modules with statements in
solrconfig.xml. If you make a copy of this example server and wish to
use the ExtractingRequestHandler (SolrCell), DataImportHandler (DIH),
UIMA, the clustering component, or other modules in "contrib", you
will need to copy the required jars into solr/lib or update the paths
to the jars in your solrconfig.xml.
Check for the clustering and carrot jars in solrconfig.xml.
Probably you are missing hppc-0.3.4-jdk15.jar

Why not use Solr's default packaging (this is officially supported)? It ships with Jetty and will save you the headaches connected with classpath because things are already configured.
Answering your question, you'll need all the JARs from Solr's default clustering extension folders; for 4.0 alpha this would be: contrib/clustering/lib/*.jar
carrot2-core-3.5.0.jar
hppc-0.3.3.jar
jackson-core-asl-1.7.4.jar
jackson-mapper-asl-1.7.4.jar
mahout-collections-0.3.jar
mahout-math-0.3.jar
simple-xml-2.4.1.jar

Did you add the Mahout Math package?
It seems to be a separate package.
NoClassDefFoundError: org.apache.mahout.math.matrix.DoubleMatrix2D
^^^^^^^^^^^^^^^^^^^^^^

With solR 4 copy this file in folder conf
http://svn.apache.org/repos/asf/labs/alike/trunk/demo/solrhome/collection1/conf/solrconfig.xml

Related

How can use the /export request handler via SolrJ?

I'm using Solr 4.10.
I have enabled the /export request handler for an index by adding this to the solrconfig.xml (as mentioned here: https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets):
<requestHandler name="/export" class="solr.SearchHandler">
<lst name="invariants">
<str name="rq">{!xport}</str>
<str name="wt">xsort</str>
<str name="distrib">false</str>
</lst>
<arr name="components">
<str>query</str>
</arr>
</requestHandler>
Now I can use: http://localhost:8983/solr/index/select?.... as well as http://localhost:8983/solr/index/export?.... from a browser or curl.
But, I cannot get it to run properly using SolrJ.
I tried (as suggested here: https://lucene.apache.org/solr/4_10_0/solr-solrj/index.html):
SolrQuery query = new SolrQuery();
...
query.setRequestHandler("/export");
...
httpSolrServer.query(query);
The query now has a parameter &qt=export. It blew up giving me:
org.apache.solr.client.solrj.SolrServerException: Error executing query
More search suggested using SolrRequest instead of SolrQuery, I tried it:
SolrQuery query = new SolrQuery();
...
query.setRequestHandler("/export");
SolrRequest solrRequest = new QueryRequest(query);
httpSolrServer.request(solrRequest);
Now I get:
java.nio.charset.UnsupportedCharsetException: gzip
Any ideas?
---edit---
I found an option in httpSolrServer.request() to add a ResponseParser. I found 4 ResponseParsers. Tried them all, the only one that worked was NoOpResponseParser. Now I have the correct results, but dumped as a plain string in a single entry in a NamedList. I tried to parse it as JSON, but it's not in proper format. Each 30,000 document, there's a missing , !!!!.
I returned back to solrconfig.xml and changed wt in the /export handler from xsort to json. Now the response format has changed, but it's also not in proper format (data is incomplete) !!!!. And XML is not
supported.
I'm truly baffled.

Solr and schemaless

I'm using Cloudera 5.4 with Solr 4.10.2 and I would like to activate the schemaless.
I've edited the solrconf.xml with
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
I don't know if I need something else for this version. I have read the Solr documentation (https://docs.lucidworks.com/display/solr/Managed+Schema+Definition+in+SolrConfig)
When I try to index an document, I get an exception:
Exception in thread "main" org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: ERROR: [doc=5.417393032179468E7] unknown field 'campo1'
I have seen that there's an schema.xml.bak how the documentation says.
Do I need to do something else?

Solr indexing fails on server.request(up)

while indexing into solr, i am getting an error like this.
HTTP Status 500 - lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:260)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
The URL formed is : http://localhost:8080/solr/update/extract?Latitude=51.9125&Longitude=179.5&commit=true&waitFlush=true&waitSearcher=true&wt=javabin&version=2
(I have configured tomcat using Xampp on Windows machine)
I have been following SOF and various other blogs/forums and tried to debug it but for hours i could not find anything.
I have added the following things in the solr.xml
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>60000</writeLockTimeout>
<commitLockTimeout>60000</commitLockTimeout>
<lockType>simple</lockType>
<unlockOnStartup>true</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" />
<lst name="defaults">
<!--str name="echoParams">explicit</str-->
<!--int name="rows">10</int-->
<!--str name="df">text</str-->
<str name="Latitude">Latitude</str>
<str name="Longitude">Longitude</str>
</lst>
Even tried adding the following to solconfig.xml ands restarting tomcat i get
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="ext.map.Last-Modified">last_modified</str>
<bool name="ext.ignore.und.fl">true</bool>
</lst>
</requestHandler>
On the Java console it shows an error :
org.apache.solr.common.SolrException: Internal Server Error
I realized the issue might be because of my solr home path. I created a new directory and copied all the config files there and mentioned that as my solr path. However, I later update the solrconfig.xml, correcting paths for all the jars.
Also tried adding the 'pdfbox and fontbox' jars in to solr lib folder and restarting Tomcat
My Java code is :
String urlString = "http://localhost:8080/solr";
SolrServer server = new CommonsHttpSolrServer(urlString);
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
String fileName=f.toString();
up.addFile(new File(fileName));
up.setParam("Latitude", Latitude);
up.setParam("Longitude", Longitude);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
(Port8080 is the one i have configured)
Still solr indexing is not working at my end.. i have tried hours debugging this and figuring out. It would be really great if you can show me some hint or suggest anything i am doing wrong.
As for your ref i have already tried :
http://wiki.apache.org/solr/FrontPage
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
http://wiki.apache.org/solr/UpdateRichDocuments
http://wiki.apache.org/solr/ExtractingRequestHandler#Configuration
http://lucene.472066.n3.nabble.com/Problem-using-ExtractingRequestHandler-with-tomcat-td494930.html
http://lucene.472066.n3.nabble.com/Internal-Server-Error-td715713.html
How to index pdf's content with SolrJ?
Finally I find a way to solve this.
Just modify SOLR_HOME/conf/solrconfig.xml, change all the dir attribute of <tab> tag from "../../dist/" to "../dist/", save the file
If you encounter this problem, that the solr directory must be move from apache-solr-x.x.x\example to some other place, so the relative path "../../dist/" need to be changed accordingly.
Remember to restart your tomcat and see if it work.
Do you have solr-cell.jar in your class path. ?
The ExtractingRequestHandler is in the solr-cell.jar, which is not packaged with the default solr-server

Integrate Nutch with Solr For Advance Search Options

I am using apache-nutch-1.4 with apache-solr-3.2.0
I have successfully integrated NUTCH with SOLR
when i have queried the following
mysite/solr/select/?q=bone&version=2.2&start=0&rows=10&indent=on
It gives me following result
<doc>
<float name="boost">1.0117649</float>
<str name="cache">content</str>
<str name="content"></str>
<str name="digest">9bf016ea547cf50be81e468553c483de</str>
<str name="id">http://107.21.107.118:8000/</str>
<str name="segment">20120214151903</str>
<str name="title">Home</str>
<date name="tstamp">2012-02-14T10:19:08.215Z</date>
<str name="url">mysite:8000/</str>
</doc>
Problem is when i have to search bone in particular category like cancer or Colorectal & Digestive
then what param i need to add in above query to get records for this specific category only
mysite:8983/solr/select/?q=bone&????????
i have urls like
mysite:8000/Encyclopedia/Patient Centers/
mysite:8000/Encyclopedia/Patient Centers/Cancer/
mysite:8000/Encyclopedia/Patient Centers/Cancer/Colorectal & Digestive/
my schema.xml file looks like this which i have added in NUTCH directory also....
http://dpaste.org/MTDF2/
my reputation is not 10 so i can not make any attachment here thats why i needed to paste schema.xml on dpaste.org...
sorry for the inconvenience it may have caused.
i will realy apreciate your advice and sugessions ...
First you have to store Cancer and Colorectal & Digestive in a category field. You can use http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory for that. Then
the URL's could look like mysite:8983/solr/select/?q=bone&fq=category:Cancer
http://wiki.apache.org/solr/CommonQueryParameters#fq

Can SOLR configuration files be located in parent folders?

I have configured the QueryElevation searchComponent of SOLR as documented here:
http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
However, I would like to load the elevate.xml file from several folders above the default one.
I cannot get this to work... all of the following generate an error:
<str name="config-file">../../elevate.xml</str>
<str name="config-file">..\..\elevate.xml</str>
<str name="config-file">c:/elevate.xml</str>
<str name="config-file">c:\elevate.xml</str>
Per the Solr wiki:
Path to the file that defines query elevation. This file must exist in:
${instanceDir}/conf/${config-file} , or
${dataDir}/${config-file}

Resources