Can not index pdf file after i updated PDFBox from 1.8 to 2.0.2 - solr

I am using PDFBox and tika for content indexing of pdf file.
Every thing is working fine with PDFFBox 1.8,But when is updated PDFBox to 2.0.2 then it is giving me below error:
(Thread-62 (HornetQ-client-global-threads-2071379348)) Exception while creating solr doucment for content::Failed to close temporary resources: org.apache.tika.exception.TikaException: Failed to close temporary resources
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:149)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.hornetq.jms.client.JMSMessageListenerWrapper.onMessage(JMSMessageListenerWrapper.java:91)
at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:983)
at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:48)
at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1113)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Could not delete temporary file C:\Users\FILESE~1\AppData\Local\Temp\apache-tika-7918716906396425097.tmp
at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70)
at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121)
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:150)
... 18 more
Can you please help me to resolve this issue?
I updated PDFBox to 2.0.2 because of this.
My gradle dependency is :
compile "org.apache.poi:poi:3.8"
compile "org.apache.poi:poi-ooxml:3.8"
compile "org.apache.poi:poi-scratchpad:3.8"
compile "org.apache.pdfbox:pdfbox:2.0.2"
compile 'org.apache.tika:tika-parsers:1.5'
compile 'org.apache.tika:tika-core:1.5'
Here I am using tika 1.5 and this version suports pdfbox 2.0.3. you can see here

You use Tika version 1.5 and claim
Tika 1.5 supports pdfbox 2.0.3
This is extremely implausible considering that Tika 1.5 has been released in February 2014 long before there was a PDFBox version 2.x, and PDFBox 2.0.0 in multiple ways is incompatible to its earlier 1.8.x releases.
You point towards the mvnrepository page for Apache Tika Parsers ยป 1.5 to support your claim. This page shows:
But all this means is that Tika 1.5 has a dependency on PDFBox 1.8.4 and that there now exists a PDFBox version 2.0.3. It does not mean that Tika 1.5 properly functions with PDFBox 2.0.3.
Looking at the pom file you'll see:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.8.4</version>
</dependency>
Thus, Tika 1.5 has been developed and compiled with PDFBox 1.8.4. If the PDFBox version numbering is sensible, you can hope for Tika 1.5 properly working with any PDFBox 1.8.x from x == 4 onwards.
But PDFBox development took the opportunity to overhaul the PDFBox architecture in their 2.0.0 release. Most likely, therefore, no program depending on a 1.x PDFBox version can function with PDFBox 2.x without changes.
According to the TIKA issue TIKA-1959, Tika can run with PDFBox 2.0.1 since version 1.13.
To make a long story short, therefore, you need at least version 1.13 if you want to use Tika with PDFBox 2.0.x.

Related

Upgrading Solr 6 to use Log4j 2.x

Is it possible to upgrade Solr 6 to use Log4j 2.x?
We have some external dependencies on Solr 6 but would like to upgrade Log4j to the latest.
From the Apache documentation
Apache Solr releases prior to 7.4 (i.e. Solr 5, Solr 6, and Solr 7
through 7.3) use Log4J 1.2.17 which may be vulnerable for installations
using non-default logging configurations that include the JMS Appender,
see
https://github.com/apache/logging-log4j2/pull/608#issuecomment-990494126
for discussion.
A good alternative would be to replace log4j 1.2.17 with reload4j, which is a direct plugin replacement for the log4j jar file. It was developed by one of the companies that developed the original log4j project in response to its recently reported vulnerabilities. You can find more information at the Github project
I am aware of at least one OpenSource project (Alfresco) that is using it successfully.

Upgrading CakePHP from 2.8.0 to 2.10.20

Currently, I have a live project in CakePHP 2.8.0 where PHP Version is 5.6.40.
I need to upgrade my CakePHP to the latest version. Can you suggest any new version that I should upgrade to without having many changes? Should I upgrade to 2.10.20 which is the 2019 release? Also, I don't want to change the PHP version which is 5.6.40 now.
Upgrades to CakePHP need to be done iteratively, ie. you'd need to cover everthing in the 2.9 migration guide before going through the 2.10 migration guide.
I'd suggest taking a copy of your code and installing the most recent version of CakePHP 2.10 for it and going through the CakePHP 2.9 Migration guide and then the CakePHP 2.10 Migration guide
Both 2.9 and 2.10 are compatible with PHP 5.6
After doing this, you can consider upgrading to CakePHP 3 which is a much bigger upgrade

How to resolve properly transitive dependencies of Tika in Fuse (camel) bundle?

I'm trying to implement Tika functionality in a Fuse (6.3) project. In the last current version 1.16 Tika offers Osgi bundle with parsers. I can't achieve the proper osgi way to include Tika in my project. Any hint how can I have to create the dependency configuration and use the osgi bundle?
Camel-Tika component is not included in JBoss Fuse 6.3. The Tika 1.16 version is in the camel-tika component, included in camel 2.20.0-SNAPSHOT.

Is it possible to use camel-2.16.2 with servicemix 5.6.0 and karaf -2.4.4?

In our application we have encountered impotency issue which is fixed by https://issues.apache.org/jira/browse/CAMEL-9480. we are currently using service mix 5.5.0 with following dependencies.
Activiti 5.17.0
Apache ActiveMQ 5.11.1
Apache Camel 2.15.2
Apache CXF 3.0.4
Apache Karaf 2.4.3 .
So in-order to get fix for CAMEL-9480, i tried to upgrade camel to 2.16.2 by doing service mix upgrade to 5.6.0 which supports following dependencies
Activiti 5.19.0.2
Apache ActiveMQ 5.12.3
Apache Camel 2.16.2
Apache CXF 3.1.4
Apache Karaf 2.4.4
after upgrade i am getting following error during deployment
[caused by: Unable to resolve 295.0: missing requirement [295.0] osgi.wiring.package; (&(osgi.wiring.package=org.eclipse.jetty.util.log)(version>=9.2.0)(!(version>=10.0.0)))]
when i checked camel-jetty-2.16.2.pom,it uses camel-jetty9 and in karaf features its mentioned as below.
<details>camel-jetty9 intend to work with jetty9, so this feature only works in the karaf container which support jetty9, e.g. karaf 4.x</details>
so is it possible to use camel-2.16.2 with karaf 2.4.4 or we need upgrade to karaf 4.x?
There are two bundles and two features:
camel-jetty which support jetty8 (and Karaf 2/3)
camel-jetty9 which support jetty9 and Karaf 4
However, It depends too on the other camel features you use. Some feature uses a feature jetty without versions, other use directly camel-jetty9 or camel-jetty, etc. Sometimes, It necessary to rewrite some feature to fix a version range, but in most case, it works out of the box.

Solr 3.1 Admin Interface

I am using Solr 1.4 , by default it comes with web application deployed as "solr.war" in webapps directory in example .
I tried to upgrade to solr 3.1 , but this vesrion (3.1) doesn't have attached web admin interface ,
I copied the solr.war from 1.4 and put it in webapps in 3.1
but this will lunch the 1.4 version of solr rather than 3.1 , I wanted to use edismax feature in 3.1 and other features and improvements ,
Please how can I use ediamx or give me regular steps to upgrade to solr 3.1 or even compile from source !!!
Thank you
Solr 3.1 does have a web admin interface. The web application archive is in the dist directory of the official distribution, the file is named apache-solr-3.1.0.war
download the source package for 3.1
and run "ant example" to build the example solr application
the new war for 3.1 should be under solr/example/webapp dir.
you will need to add the following config in your solrconfig.xml
<luceneMatchVersion>LUCENE_31</luceneMatchVersion>
also make sure your "unique id" field in schema.xml is of type string

Resources