What is going to replace the DataImportHandler in Solr 9.0? - solr

The latest documentation (8.8) for Apache Solr says that the Data Import Handler is deprecated is going to be removed in future versions. It only lists a third-party plugin maintained on Github, and no other native alternative to import data from relational databases. Am I missing something or has Solr dropped native support for this?

Data import handler will still be there, but as a contrib package.
I'll do what I did before data import handler existed and write a separate program. I prefer Python, where reading a database and sending JSON updates is pretty simple.
Be sure to batch the updates.

Alternative solr DIH
https://github.com/saro-lab/solr-db-importer
I made an alternative program to DIH and released the source code and manual.

I want to implement your solr-sb-importer
But it seems stay stuck at running.
I suspect the loading of the schema is not working
http://localhost:8983/solr/succesboeken_sb/sb-schema
I get:
Exception in thread "main" org.springframework.web.client.HttpClientErrorException$NotFound: 404 Not Found: " Searching for Solr? You must type the correct path. Solr will respond."
while excecuting:
java -jar solr-db-importer-1.4.jar
I am using Solr 9.1.0
Have you encountered this before?

Related

MultiRequestHandler not working in solr

I'm was trying to run multiple queries in a single go then I came across something called MultiRequestHandler. I had put request handler in the solrconfig.xml file and restarted the solr.
<requestHandler name="/multi" class="solr.MultiRequestHandler"/>
I'm getting the error
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Error loading class 'solr.MultiRequestHandler'
My version of Solr is 5.4.0. Does my version of Solr doesnot support MultiRequestHandler?
There never was a MultiRequestHandler added to Solr. The ticket tracking its addition were resolved as Won't fix.
The patch is available on Github, but hasn't been updated in three years - so it might not apply cleanly to 5.4.x.
Newer versions of Solr do however have Streaming Expressions, which could be what you're looking for.

is there any configuration for solr 5.3.1 that enable opennlp integration?

I saw there was a article in the Apache wiki on OpenNLP for Solr.
Is it valid for current solr version 5.3.1?
No, if you have a look at LUCENE-2899, you'll see that the code discussed was never added to trunk. You'll have to download/patch/update the code yourself if you're going to have it native to Solr.
It's probably a better idea to do all the NLP stuff outside of Solr, then index the result in a form suited for the task you're trying to solve.
Yes. It's better to keep it outside.
Here is a small project I tried.
https://github.com/john77eipe/DeepQA

upgrade solr from 4.2.1 to 5.3.1

I've been tasked with migrating from our solr 4.2.1 server to a new solr server, 5.3.1. I was hoping I could just pick up the cores, and move them over with a little but of editing files. But atlas, I can't quite figure it out.
I have tried moving a single core, and creating a core.properties files with the name of the core and I get:
testcore: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error loading class 'solr.JsonUpdateRequestHandler'
Any thoughts as to what the problem might be? Any thoughts would be appreciated, thank you!
I am in the final stages of the similar upgrade; here is how I suggest you proceed.
Install both versions side by side and create the collection in new solr
Take your default schema/solrconfig from the new solr and move stuff into it from your old schema/solrconfig. The formatting changed, so you will need to manually move all of your config.
Make sure that works
Move the indexes - once your solrconfig and schema match up you should be able to use your old indexes (data directory).
To complete the upgrade you will need to re-index into a new but similar collection. This will upgrade the underlying lucene indexes. Your new version of solr has cursor mark support so it becomes much simplier; especially if you are using collection aliases.
JSON does not have its own request handler any longer (changed in 4.x, removed in 5.x). It has now been merged into the standard solr.UpdateRequestHandler, and the request handler is selected internally based on the Content-Type header of the request.

Disappearing cores in Solr

I am new to Solr.
I have created two cores from the admin page, let's call them "books" and "libraries", and imported some data there. Everything works without a hitch until I restart the server. When I do so, one of these cores disappears, and the logging screen in the admin page contains:
SEVERE CoreContainer null:java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException
SEVERE SolrCore REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore#454055ac (papers) has a reference count of 1
I was testing my query in the admin interface; when I refreshed it, the "libraries" core was gone, even though I could normally query it just a minute earlier. The contents of solr.xml are intact. Even if I restart Tomcat, it remains gone.
Additionally, I was trying to build a query similar to this: "Find books matching 'war peace' in libraries in Atlanta or New York". So given cores "books" and "libraries", I would issue "books" the following query (which might be wrong, if it is please correct me):
(title:(war peace) blurb:(war peace))
AND _query_:"{!join
fromIndex=libraries from=libraryid to=libraryid
v='city:(new york) city:(atlanta)'}"
When I do so, the query fails with "libraries" core disappears, with the above symptoms. If I re-add it, I can continue working (as long as I don't restart the server or issue another join query).
I am using Solr 4.0; if anyone has a clue what is happening, I would be very grateful. I could not find out anything about the meaning of the error message, so if anyone could suggest where to look for that, or how go about debugging this, it would be really great. I can't even find where the log file itself is located...
I would avoid the Debian package which may be misconfigured and quirky. And it contains (a very early build of?) solr 4.0, which itself may have lingering issues; being the first release in a new major version. The package maintainer may not have incorporated the latest and safest Solr release into his package.
A better way is to download Solr 4.1 yourself and set it up yourself with Tomcat or another servlet container.
In case you are looking to install SOLR 4.0 and configure, you can following the installation procedure from here
Update the solr config for the cores to be persistent.
In your solr.xml, update <solr> or <solr persistent="false"> to <solr persistent="true">

Using Solr to read OpenGrok's database and failing with "no segments* file found"

I need a simple way to read OpenGrok's DB from a php script to do some weird searches (as doing that in Java in OpenGrok itself isn't in my abilities). So I decided to use Solr as a way to query the Lucene DB directly from another language (probably PHP or C).
The problem is that when I point Solr to /var/opengrok/data, it bombs out with:
java.lang.RuntimeException: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory#/var/opengrok/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory#3a329572: files: [] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103)
(etc, etc, the backtrace is about three screens long)
I tried to point it somewhere inside data with no luck. The structure looks like this:
/var/opengrok/data/index/$projname/segment*
/var/opengrok/data/spelling...
and seems like whatever Solr is using is expecting the segment files directly in the index directory.
I checked to see if there's any version discrepancy, but OpenGrok 0.11 is using Lucene 3.0.2 and I've set Solr to LUCENE_30 as the database version.
Any pointers will be greatly appreciated, google didn't seem to be able to help with this.
opengroks web interface can consume any well formed search query (through url) and reply with xhtml results which are easily parse-able, so you're probably making it too complex to hack inside the lucene rather than using UI provided ...

Resources