How lucene works with Neo4j - solr

I am new to Neo4j and Solr/Lucene. i have read that we can use lucene query in Neo4j how does this works? What is the use of using lucene query in Neo4j.?
And also i need a suggestion. I need to write an application to search and analyse the data. which might help me Neo4j Or Solr?

Neo4J uses lucene as part of its legacy indexing. Right now, Neo4J supports several kinds of indexes, like creating labels on nodes, and indexes on node properties.
But before neo4j supported those new features, it primarily (and still) used Lucene for indexing. Most developers would create lucene indexes on particular node properties, to enable them to use lucene's query syntax to find nodes within a cypher query.
For example, if you created an index according to the documentation, you could then search the index for particular values like this:
IndexHits<Node> hits = actors.get( "name", "Keanu Reeves" );
Node reeves = hits.getSingle();
It's lucene behind the scenes that's actually doing that finding.
In cypher, it might look like this:
start n=node:node_auto_index('name:M* OR name:N*')
return n;
In this case, you're searching a particular index for all nodes that have a name property that starts either with an "M" or an "N". What's inside of that single quote expression there is just a query according to the lucene query syntax.
OK, so that's how Neo4J uses lucene. In recent versions, I only use these "legacy indexes" for fulltext indexing, which is where lucene's strength is. If I just want fast equality checks (where name="Neo") then I use regular neo4j schema indexes.
As for Solr, I haven't seen it used in conjunction with neo4j - maybe someone will jump in and provide a counter-example, but usually I think of Solr as running on top of a big lucene index, and in the case of neo4j, it's kind of in the middle there, and I'm not sure running Solr would be a good fit.
As for you needing to write an application to search and analyze data, I can't give you a recommendation - either Neo4J or Solr might help, depending on your application and what you want to do. In generalities, use neo4j when you need to express and search graphs. Use Solr more when you need to organize and search large volumes of text documents.

Related

Is Solr Better than the normal RDBMS in case of searching normal queries i.e not full text search?

I am developing a web application where I want to use Solr for search only and keep my data on another Database.
I will be having 2 databases: one Relational (Sql Server) and the other will be a copy of it on the NoSQL Solr database.
I'll be searching for specific fields in the solr documents e.g(by id,name,type and join queries) i.e NOT full text search.
I know Solr strength is in full text search by creating inverted index on the documents data, now i want to know does it also helps in my case by creating another type of index on my documents which make normal searching faster than sql server index?
Yes, it will help you.
You need to consider what is your requirement. What is your preference?
If you have the solr as another additional option which will be used for the searching the application data, you need to consider that you have to constantly update the solr. You will need additional infrastructure and all.
If the performance is your main criteria and you don't want to put any search load on your RDBMS then you can add the solr to your system. Also consider how big your data is in the RDBMS. Because RDBMS system are also enough strong to support searching data.
Considering all the above aspects you can take the decision.

What to be aware of when querying an index with Elasticsearch when indexing with SOLR?

As part of a refactoring project I'm moving our quering end to ElasticSearch. Goal is to refactor the indexing-end to ES as well in the end, but this is pretty involved and the indexing part is running stable so this has less priority.
This leads to a situation where a Lucene index is created / indexed using Solr and queried using Elasticsearch. To my understanding this should be possible since ES and SOlR both create Lucene-compatable indexes.
Just to be sure, besides some housekeeping in ES to point to the correct index, is there any unforseen trouble I should be aware of when doing this?
You are correct, Lucene index is part of elasticsearch index. However, you need to consider that elasticsearch index also contains elasticsearch-specific index metadata, which will have to be recreated. The most tricky part of the metadata is mapping that will have to be precisely matched to Solr schema for all fields that you care about, and it might not be easy for some data types. Moreover, elasticsearch expects to find certain internal fields in the index. For example, it wouldn't be able to function without _uid field indexed and stored for every record.
At the end, even if you will overcome all these hurdles you might end up with fairly brittle solution and you will not be able to take advantage of many advanced elasticsearch features. I would suggest looking into migrating indexing portion first.
Have you seen ElasticSearch Mock Solr Plugin? I think it might help you in the migration process.

Is it advisable to use Lucene for this?

I have a huge XML file, about 2GB in size, containing Resumes. There are thousands of resumes in this file, tagged properly. Right now I am using XPATH to query it. So is it advisable to use Lucene for the same instead of XPATH?
Depends upon what your requirements are. If you need full-text searching and all other great features of a full-blown search engine, Lucene is the way to go. I would recommend Solr which builds on top of lucene and provides a much better API and abstraction.
Like everything else technology related, it depends.
What Lucene gives you that you're not getting with XPath is the power of a full-text engine that supports among other things ranking and the ability to phrase queries, wildcard queries etc.
Based on your use-case I would say that at full-text search engine makes sense. That's not to say that vanilla Lucene is the best way to go (there are for example other alternatives that build on Lucene).
2GB seems to be pretty less for which I would contruct my own inverted index (a minimal one) :) However no problem in using Lucene/Solr though. Go ahead. It will help you once your records starts doubling. However at this scale (2GB) or even much larger many real life stuff is working on databases full text searches using SQL like keyword.

Can I use Solr just for search an existing Lucene index?

I use Lucene locally to index documents. I know how to use Lucene pretty well. I never used Solr but I want to run a web search using a Lucene index so I'm now looking into it.
Can I install Solr on EC2 let's say, and then instead of indexing documents using Solr, doing it locally using Lucene directly and then just coping the Lucene index from my machine to EC2 which Solr will be using for search?
I'm assuming it's possible as long as I keep the index on disk but would like to be sure.
Thanks!
It's certainly possible, you would only make sure to maintain the exactly the same index structure (defined by Solr schema). However, it would also mean that your configuration would be stored in two completely separate places -- e.g. each time you would change an analyzer in Lucene, you would need to synchronize this change in Solr XML configuration. I'm not sure what benefit would Solr bring in such use case.

Which is better enabling indexing on RDBMS or Lucene Indexing

I have an application which uses traditional Database for all of its data , i need to develop a search functionality, i did small prototype with lucene and results are gr8 , now the bigger question arises , for each of users add/delete/update operations i need to update db and the Lucene index too , will I get similar search performance if i just enable indexing on few fields in traditional db instead of moving to Lucene ? is it worth the effort ?.
It depends entirely on the size of the corpus and on the type and frequency of updates.
A separated full-text search solution like lucene gives you much more flexibility when tweaking relevance, and by decoupling the updates of the rdbm and the full-text index gives you more options when trying to optimize performance.
If your never played with Lucene, I would greatly recommend you to use some more high-level solution, like Solr (or websolr), Sphinx, ElasticSearch or IndexTank. Lucene is very low level.

Resources