Is it safe to turn off Jackrabbit's indexing feature? - jackrabbit

We only use Jackrabbit for storing files, which we are later retrieving using their full path or UUID. Is it safe to turn off the Jackrabbit index in this case?
What about Jackrabbit XPath queries, do they use the fulltext index?

When you only access the nodes using the path and the UUID (identifier), you can safely turn of the Lucene index.
The XPath queries do need to the Lucene index.
To reduce the overhead of maintaining the Lucene index, you could use a custom index configuration, and specially index rules to only index certain properties.

Related

How lucene works with Neo4j

I am new to Neo4j and Solr/Lucene. i have read that we can use lucene query in Neo4j how does this works? What is the use of using lucene query in Neo4j.?
And also i need a suggestion. I need to write an application to search and analyse the data. which might help me Neo4j Or Solr?
Neo4J uses lucene as part of its legacy indexing. Right now, Neo4J supports several kinds of indexes, like creating labels on nodes, and indexes on node properties.
But before neo4j supported those new features, it primarily (and still) used Lucene for indexing. Most developers would create lucene indexes on particular node properties, to enable them to use lucene's query syntax to find nodes within a cypher query.
For example, if you created an index according to the documentation, you could then search the index for particular values like this:
IndexHits<Node> hits = actors.get( "name", "Keanu Reeves" );
Node reeves = hits.getSingle();
It's lucene behind the scenes that's actually doing that finding.
In cypher, it might look like this:
start n=node:node_auto_index('name:M* OR name:N*')
return n;
In this case, you're searching a particular index for all nodes that have a name property that starts either with an "M" or an "N". What's inside of that single quote expression there is just a query according to the lucene query syntax.
OK, so that's how Neo4J uses lucene. In recent versions, I only use these "legacy indexes" for fulltext indexing, which is where lucene's strength is. If I just want fast equality checks (where name="Neo") then I use regular neo4j schema indexes.
As for Solr, I haven't seen it used in conjunction with neo4j - maybe someone will jump in and provide a counter-example, but usually I think of Solr as running on top of a big lucene index, and in the case of neo4j, it's kind of in the middle there, and I'm not sure running Solr would be a good fit.
As for you needing to write an application to search and analyze data, I can't give you a recommendation - either Neo4J or Solr might help, depending on your application and what you want to do. In generalities, use neo4j when you need to express and search graphs. Use Solr more when you need to organize and search large volumes of text documents.

What to be aware of when querying an index with Elasticsearch when indexing with SOLR?

As part of a refactoring project I'm moving our quering end to ElasticSearch. Goal is to refactor the indexing-end to ES as well in the end, but this is pretty involved and the indexing part is running stable so this has less priority.
This leads to a situation where a Lucene index is created / indexed using Solr and queried using Elasticsearch. To my understanding this should be possible since ES and SOlR both create Lucene-compatable indexes.
Just to be sure, besides some housekeeping in ES to point to the correct index, is there any unforseen trouble I should be aware of when doing this?
You are correct, Lucene index is part of elasticsearch index. However, you need to consider that elasticsearch index also contains elasticsearch-specific index metadata, which will have to be recreated. The most tricky part of the metadata is mapping that will have to be precisely matched to Solr schema for all fields that you care about, and it might not be easy for some data types. Moreover, elasticsearch expects to find certain internal fields in the index. For example, it wouldn't be able to function without _uid field indexed and stored for every record.
At the end, even if you will overcome all these hurdles you might end up with fairly brittle solution and you will not be able to take advantage of many advanced elasticsearch features. I would suggest looking into migrating indexing portion first.
Have you seen ElasticSearch Mock Solr Plugin? I think it might help you in the migration process.

Distributed search in SOLR

I am using SOLR 1.3.0 for performing a distributed search over already existing lucene indices. The question is, is there any way in which I could find from which shard did a result come up after the search?
P.S : I am using the REST api.
For Solr sharding -
Documents must have a unique key and the unique key must be stored
(stored="true" in schema.xml)
I think the logic should be already there on your side, by which you are feeding the data to the shards, as the ids need to be unique.
e.g. the simplest is the odd even combination, but you may have some complex ones by which you distribute the data into the shards.
You may be able to get some information using debugQuery=on, but if this is something that you'll query often I'd add a specific stored field for the shard name.
PS: Solr doesn't have a REST API.

Can I use Solr just for search an existing Lucene index?

I use Lucene locally to index documents. I know how to use Lucene pretty well. I never used Solr but I want to run a web search using a Lucene index so I'm now looking into it.
Can I install Solr on EC2 let's say, and then instead of indexing documents using Solr, doing it locally using Lucene directly and then just coping the Lucene index from my machine to EC2 which Solr will be using for search?
I'm assuming it's possible as long as I keep the index on disk but would like to be sure.
Thanks!
It's certainly possible, you would only make sure to maintain the exactly the same index structure (defined by Solr schema). However, it would also mean that your configuration would be stored in two completely separate places -- e.g. each time you would change an analyzer in Lucene, you would need to synchronize this change in Solr XML configuration. I'm not sure what benefit would Solr bring in such use case.

Which is better enabling indexing on RDBMS or Lucene Indexing

I have an application which uses traditional Database for all of its data , i need to develop a search functionality, i did small prototype with lucene and results are gr8 , now the bigger question arises , for each of users add/delete/update operations i need to update db and the Lucene index too , will I get similar search performance if i just enable indexing on few fields in traditional db instead of moving to Lucene ? is it worth the effort ?.
It depends entirely on the size of the corpus and on the type and frequency of updates.
A separated full-text search solution like lucene gives you much more flexibility when tweaking relevance, and by decoupling the updates of the rdbm and the full-text index gives you more options when trying to optimize performance.
If your never played with Lucene, I would greatly recommend you to use some more high-level solution, like Solr (or websolr), Sphinx, ElasticSearch or IndexTank. Lucene is very low level.

Resources