Strange exception in MimeBodyPart.getContent() - jakarta-mail

I'm using javamail 1.6.1 together with Apache TIKA (1.18) to parse email content as text. When I was using TIKA 1.17 it was working fine (some errors here and there, but 99% was working fine). When I'm switched to TIKA 1.18 something strange started to happen (in production environment) , I started to get (in masses) this exception:
IllegalArgumentException: failed to parse:
at java.lang.IllegalArgumentException: failed to parse:
at java.awt.datatransfer.DataFlavor.<init>(DataFlavor.java:435)
at javax.activation.ActivationDataFlavor.<init>(ActivationDataFlavor.java:81)
When I tried to reproduce this issue on my local workstation, it was not reproducible. Switching back to TIKA 1.17 solved that issue.
I know that's it sounds very strange but any help will be highly appreciated.
Thanks

Related

Solr 4 Data Import Handler doesn't work

I am deploying Solr 4.3.0 in Tomcat 7.
Everything works fine but DataImportHandler. I can go to the
http://localhost:8080/solr/#/collection1/dataimport//dataimport
screen and see the dataimport options load at the UI.
Still, I can see any of my entities load in the "entity" combo box. Inside the configuration box, at the right side I can see the error below.
Apache Tomcat/7.0.41 - Error
report
525D76;}--> HTTP Status 500 - Filter execution threw an exception
noshade="noshade">type Exception reportmessage
Filter execution threw an exceptiondescription
The server encountered an internal error that prevented it from
fulfilling this request.exception
javax.servlet.ServletException: Filter execution threw an
exception root cause
java.lang.NoClassDefFoundError: org/apache/log4j/spi/LoggingEvent
org.apache.solr.logging.log4j.EventAppender.append(EventAppender.java:35)
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
org.apache.log4j.Category.callAppenders(Category.java:206)
org.apache.log4j.Category.forcedLog(Category.java:391)
org.apache.log4j.Category.log(Category.java:856)
org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)
org.apache.solr.common.SolrException.log(SolrException.java:119)
org.apache.solr.servlet.ResponseUtils.getErrorInfo(ResponseUtils.java:58)
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:691)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
note The full stack trace of the root cause is
available in the Apache Tomcat/7.0.41 logs.Apache Tomcat/7.0.41
Problem is that I have the "log4j-1.2.16.jar" loaded in the classpath (it's on Tomcat lib dir).
Anyone have stepped in this problem?
Try following the steps outlined in Using the example logging setup in containers other than Jetty. I have encountered this same error when running Solr 4.3 until I followed these steps to configure logging.
After changing the directory, did you change the directory path in solrconfig.xml file.
I just want to make sure after the making changes in configuration file, did you restart the tomcat and solr server?
You need to copy the slf4j-log4j12-1.6.6.jar from the ext of Solr into the lib folder.
You also need to put the logging.properties file there.

can't index rich documents on both solr 3.6 and solr 4.0 using update/extract getting "#500 lazy loading error"

I'v just started to learn solr. From last 3 days I'm in trouble. I can not
index rich documents on solr 3.6 and 4.0. I am using windows7 64bit.
what i tried is as:
First I installed solr 3.6 with tomcat-jetty.using BitNami Apache
1.tried -Durl command what i got :
error #500 lazy loading error
2.Download curl for my window machine and tried curl i got: error #500 lazy loading error
3.copied a program from solr tutorial to upload a file using solrJ for
SolrJ in NetBeans IDE and tried a pdf files to indexed using
update/extract
then i got:
org.apache.solr.common.SolrException: Server at
"myServer:port/solr" returned non ok status:500, message:Internal
Server Error
4.changed solconfig.xml so removed startup=lazy from update/extract
request handler and got the same thing
I re-installed solr 3.6 again but can't succeed. 4.0 gives the same error.
Same problem with some other request handler also like /browse says
etc.
Should i switch to Linux?
Looks like the packager (Bitnami) did not include that library, even though they left Solr configured to use that library. You may ask them to resolve it. Or you can deploy it yourself.
Here's how to deploy Solr on Tomcat. Its equally easy to install on Windows; and it starts as a Windows service. Once installed, to enable the rich document support, copy the contents of contrib/extraction/lib/ to a directory and point the sharedLib in solr.xml to that directory. If you have used that guide, you will understand those new terms :-)

NoClassDefFoundError MimeTypeException with PDF extraction

I am getting an exception trying to use update/extract with PDF files
My Set up is:-
Ubuntu Server 11.10
Tomcat 6
Solr 3.5.0.2011.11.22.15.54.38
I can browse to solr/admin OK
I have put all the contrib/extract and apache-solr-cell3.5.0.jar libraries into the tomcat folder webapps/solr/WEB-INF/lib
I am calling extract using:-
curl "http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true" -F "file=/path/to/my.pdf"
error is
java.lang.NoClassDefFoundError: org/apache/tika/mime/MimeTypeException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:383)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:461)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
Would appreciate any pointers - the only time this error seems to come up elsewhere is with Nutch and cached results.
I have tried sending the mimetype in the querystring and also a *.doc file but got the same error.
According to the error message it is not a MimeTypeException exception you get: The problem is a NoClassDefFoundError, because Solr cannot load the class MimeTypeException.
Normally this class is present in tika-core.jar.
Make sure you actually have that file and also check if you have a lib statement in your solrconfig.xml pointing to the right directory.
This was due to the basic error of copying the necessary tika libraries (to tomcat6/webapps/solr/WEB-INF/lib) but leaving ownership of the jar files as ROOT instead of chown-ing them to TOMCAT6. After setting the right permission and restarting Tomcat it started working OK
Found the solution of this problem, I was using SolrJ to update my pdf indexing.
after deploy solr to tomcat, I didn't include the following libraries into the tomcat/webapp
and I get all the lazy loading problem, etc etc
I even try to get apache tika...
until I do this...
shutdown tomcat
\apache-solr-3.5.0\contrib\extraction
copy the libraries above to below
\apache-tomcat-7.0.26\webapps\solr\WEB-INF\lib
startup tomcat
cheers

solr-cell search works for some pdfs not others

I have been searching for two days and have not been able to find an answer.
I have solr installed from the repos on an Ubuntu server running on tomcat 6. I have added the solr-cell jar and tika libraries.
I can run a curl command that works for some pdf files and indexes them fine, but it does not not work for others. At first i thought that some files were corrupted but that does not appear to be be the case. There does not appear to me to be any major difference between the ones thaqt work and those that don't.
The error i get is a 500 error - see example here
The curl request i make is:
$ curl 'http://mysolrserver.com:port/solr/update/extract?map.content=text&map.stream_name=id&extractOnly=true&commit=true' -F "file=#/absolute/path/to/file.pdf"
This does work for some PDFs fine, just not others.
I believe I have solr 1.4.0 installed.
Any help would be appreciated - thank you
--EDIT--
I am using Ubuntu 10.04.1 if that helps at all.
A NullPointerException is probably a bug. Report it to PDFBox and/or Tika.
OK the nightly snapshot of solr uses PDFBox 1.3.1 as opposed to the current stable which uses 0.7.* which is a fair amount of revision changes.
I can index all the pdfs using this snapshot version of solr. This seems to me something that will be
fixed in the next stable version.

Solr + Jetty Gives HTTP 503 on Debian

(This is a cross-post from servefault. I'm posting it here because no one answered my post there, and I feel that this sort of hits an awkward space half-way between both stackoverflow and serverfault.)
I have modified the example project included with Solr for my needs (removing things like the example stopwords and defining my own schema). Running this project on my mac, everything works fine: I can start Jetty and run search queries. But when I push the project out to a Debian system, I get this error when I try to do search queries:
HTTP ERROR: 503
SERVICE_UNAVAILABLE RequestURI=/solr
Powered by jetty://
The request log shows that a request was made:
10.10.124.14 - - [22/06/2010:22:34:52 +0000] "GET /solr
HTTP/1.1" 503 1311
No error log is produced (at least not on in the ./logs directory).
I have tried to run this project both on openjdk and the Sun JRE. Both started jetty fine, but produced the same error when searching. I am running Debian 9.0.4.
The issue is probably that the datastore in Debian is /var/lib/solr/data and you need to set that line in your version solrconfig.xml instead of the default which is in the base directory /usr/share/solr/ which could be a read only file system.
I've packaged the last solr version in Debian Testing. It seems, that there is some error in the solr configuration so that jetty starts, but it can't start the solr servlet. You must look in the jetty error log to find the reason.
There's lack of manpower in Java Packaging for Debian, so it may well be that there is an error in the solr-jetty package.
The solr-jetty package in Debian stable doesn't work as I recall. Please try from Debian testing!
If you indeed find an error, please don't use random forums but post a bug on bugs.debian.org!
Success!

Resources