can't index rich documents on both solr 3.6 and solr 4.0 using update/extract getting "#500 lazy loading error" - solr

I'v just started to learn solr. From last 3 days I'm in trouble. I can not
index rich documents on solr 3.6 and 4.0. I am using windows7 64bit.
what i tried is as:
First I installed solr 3.6 with tomcat-jetty.using BitNami Apache
1.tried -Durl command what i got :
error #500 lazy loading error
2.Download curl for my window machine and tried curl i got: error #500 lazy loading error
3.copied a program from solr tutorial to upload a file using solrJ for
SolrJ in NetBeans IDE and tried a pdf files to indexed using
update/extract
then i got:
org.apache.solr.common.SolrException: Server at
"myServer:port/solr" returned non ok status:500, message:Internal
Server Error
4.changed solconfig.xml so removed startup=lazy from update/extract
request handler and got the same thing
I re-installed solr 3.6 again but can't succeed. 4.0 gives the same error.
Same problem with some other request handler also like /browse says
etc.
Should i switch to Linux?

Looks like the packager (Bitnami) did not include that library, even though they left Solr configured to use that library. You may ask them to resolve it. Or you can deploy it yourself.
Here's how to deploy Solr on Tomcat. Its equally easy to install on Windows; and it starts as a Windows service. Once installed, to enable the rich document support, copy the contents of contrib/extraction/lib/ to a directory and point the sharedLib in solr.xml to that directory. If you have used that guide, you will understand those new terms :-)

Related

Mint 18.1, Ckan, SOLR schema version not supported

I'm a front-end developer and I need to do a Ckan Theme. To do so, I need a working source install of CKAN on my system. I'm using Mint 18.1 and installing Ckan 2.6.2.
Following the steps of the installation of ckan's docs I've got a warning and an error at step 6 as shown on the image.
As you can see the last line says SOLR schema version not supported: 2.7. Supported versions are [2.3] and I can't proceed with the installation. Searching on the Internet I found people having the same problem, but using Docker (have no idea what is this) and their solutions didn't work for me.
Because I have a really short time to build this theme I gave up CKAN 2.6.2 and installd 2.5.2 and everything worked fine.
The SOLR schema that comes with CKAN 2.6.2 is version 2.3, so somehow you have got 2.7, which is provided with later versions of CKAN. Maybe you installed CKAN master and the schema is lingering from then.
Here are some steps so that you can find out where the problem is:
You can check the version of the schema in the CKAN source repo on your disk:
grep 'name="ckan" version=' /usr/lib/ckan/default/src/ckan/ckan/config/solr/schema.xml
You would have then installed this file into Solr (in Step 5, using the 'ln' command). You can check the version in Solr:
grep 'name="ckan" version=' /etc/solr/conf/schema.xml
(When this file is changed, you need to restart SOLR (i.e. jetty) for it to take effect - see the docs again).
You can see what schema SOLR is actually using:
curl -s 'http://localhost:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml'|grep 'name="ckan" version='
Please do feed back on these.
It sounds like your Docker container for SOLR is a newer version than that is not compatible with CKAN 2.6.2.

error when using solr and Integrating nutch and solr(HTTP ERROR 500)

I have Linux Ubuntu 12.04 installed and I'm trying to install nutch 1.5.1 and solr 3.6.1 and integrate theme together to crawl seed urls.
I'm using This tutorial to get this work.
I followed the steps before 3.2 and skipped to step 4 and I can access to
localhost:8983/solr/admin/
without error.
but when going to step 6 and copying schema.xml from conf folder of nutch to example/solr/conf folder of solr
solr/admin page occurs a java error,below:
How can I handle that?
one more thing to ask....
I have another tutorial for this that looks good but in first step it mentions that add some code to nutch-site.xml file in /conf/ and /runtime/local/conf/ folder
but in nutch folder there is no runtime folder.In step 4 this folder mentioned too.
any suggestion?
thanks in advance
This is just bit of red herring. The line that specifies version number something like:
<schema name="nutch" version="1.5.1">
is causing it because the value of version is being parsed as float. remove the extra dot. Change it to 1.5 or 1.51 to make it valid float and restart your solr instance. The exception should disappear.
Check,please, whether are Nutch 1.5.1 and Solr 3.6.1 compatible (are they having same versions of lucene-core and solr-solrj jars). I got some problems with incompatible versions, but not with 1.5/3.6 .

parsering (using Tika) on remote glassfish

I'm using Tika parser to index my files into Solr. I created my own parser (which extends XMLParser). It uses my own mimetype.
I created a jar file which inside looks like this:
src
|-main
|-some_packages
|-MyParser.java
|resources
|-META-INF
|-services
|-org.apache.tika.parser.Parser (which contains a line:some_packages.MyParser.java)
|_org
|-apache
|-tika
|-mime
|-custom-mimetypes.xml
In custom-mimetypes I put the definition of new mimetype becouse my xml files have some special tags.
Now where is the problem: I've been testing parsing and indexing with Solr on glassfish installed on my local machine. It worked just fine. Then I wanted to install it on some remote server. There is the same version of glassfish installed (3.1.1). I copied-pasted Solr application, it's home directory with all libraries (including tika jars and the jar with my custom parser). Unfortunately it doesn't work. After posting files to Solr I can see in content-type field that it detected my custom mime type. But there are no fields that suppose to be there like if MyParser class was never runned. The only fields I get are the ones from Dublin Core. I checked (by simply adding some printlines) that Tika is only using XMLParser.
Have anyone had similar problem? How to handle this?
Problem was that I was using Java 7 to compile my parser but Apache Tika was compiled with Java 5...

NoClassDefFoundError MimeTypeException with PDF extraction

I am getting an exception trying to use update/extract with PDF files
My Set up is:-
Ubuntu Server 11.10
Tomcat 6
Solr 3.5.0.2011.11.22.15.54.38
I can browse to solr/admin OK
I have put all the contrib/extract and apache-solr-cell3.5.0.jar libraries into the tomcat folder webapps/solr/WEB-INF/lib
I am calling extract using:-
curl "http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true" -F "file=/path/to/my.pdf"
error is
java.lang.NoClassDefFoundError: org/apache/tika/mime/MimeTypeException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:383)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:461)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
Would appreciate any pointers - the only time this error seems to come up elsewhere is with Nutch and cached results.
I have tried sending the mimetype in the querystring and also a *.doc file but got the same error.
According to the error message it is not a MimeTypeException exception you get: The problem is a NoClassDefFoundError, because Solr cannot load the class MimeTypeException.
Normally this class is present in tika-core.jar.
Make sure you actually have that file and also check if you have a lib statement in your solrconfig.xml pointing to the right directory.
This was due to the basic error of copying the necessary tika libraries (to tomcat6/webapps/solr/WEB-INF/lib) but leaving ownership of the jar files as ROOT instead of chown-ing them to TOMCAT6. After setting the right permission and restarting Tomcat it started working OK
Found the solution of this problem, I was using SolrJ to update my pdf indexing.
after deploy solr to tomcat, I didn't include the following libraries into the tomcat/webapp
and I get all the lazy loading problem, etc etc
I even try to get apache tika...
until I do this...
shutdown tomcat
\apache-solr-3.5.0\contrib\extraction
copy the libraries above to below
\apache-tomcat-7.0.26\webapps\solr\WEB-INF\lib
startup tomcat
cheers

solr-cell search works for some pdfs not others

I have been searching for two days and have not been able to find an answer.
I have solr installed from the repos on an Ubuntu server running on tomcat 6. I have added the solr-cell jar and tika libraries.
I can run a curl command that works for some pdf files and indexes them fine, but it does not not work for others. At first i thought that some files were corrupted but that does not appear to be be the case. There does not appear to me to be any major difference between the ones thaqt work and those that don't.
The error i get is a 500 error - see example here
The curl request i make is:
$ curl 'http://mysolrserver.com:port/solr/update/extract?map.content=text&map.stream_name=id&extractOnly=true&commit=true' -F "file=#/absolute/path/to/file.pdf"
This does work for some PDFs fine, just not others.
I believe I have solr 1.4.0 installed.
Any help would be appreciated - thank you
--EDIT--
I am using Ubuntu 10.04.1 if that helps at all.
A NullPointerException is probably a bug. Report it to PDFBox and/or Tika.
OK the nightly snapshot of solr uses PDFBox 1.3.1 as opposed to the current stable which uses 0.7.* which is a fair amount of revision changes.
I can index all the pdfs using this snapshot version of solr. This seems to me something that will be
fixed in the next stable version.

Resources