Any way to see the inverted index of a doc in SOLR - solr

I am wondering if there is a way to look at the inverted index of a doc in SOLR? I checked solr admin tool but couldn't find anything.

Check the LukeRequestHandler within Solr, should enable you to get the index data.

there is Luke but lately is is not being upgraded to latest lucene version, someone will recompile it soon probably.

Related

Running with SOLR and Lucene together in Sitecore

Does anybody know how to have Lucene and Solr together in the same Sitecore Instalation?
Sitecore states that is possible here:
https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/search_and_indexing/indexing/using_solr_or_lucene
You can mix Lucene and Solr, and, for example, use Solr for xDB and
Lucene for content search at the same time. If an index is small, it
is much easier to manage as a Lucene index because there is little to
no overhead to set it up.
But there is no reference on how to configure it.
Any advise is welcome.
Cheers!
In words, your analytic indexes will be using SOLR and your content search indexes will be using Lucene.
To configure your analytic indexes to use SOLR, you can check the following documentation from Sitecore: https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/xdb/configuring_servers/configure_a_processing_server#_Solr_configuration
By default, Sitecore already configured Lucene to be used for Content Search. So, for this, there is no change required.
However, I am not sure that SOLR and Lucene can be used for Content Search or xDB at the same time because of its configuration. For example, the Content Search makes use of the index configuration master, web and core. If you decide to use SOLR for Content Search, you will need to disable the Lucene configuration file from the Include folder.
Thanks

is there any configuration for solr 5.3.1 that enable opennlp integration?

I saw there was a article in the Apache wiki on OpenNLP for Solr.
Is it valid for current solr version 5.3.1?
No, if you have a look at LUCENE-2899, you'll see that the code discussed was never added to trunk. You'll have to download/patch/update the code yourself if you're going to have it native to Solr.
It's probably a better idea to do all the NLP stuff outside of Solr, then index the result in a form suited for the task you're trying to solve.
Yes. It's better to keep it outside.
Here is a small project I tried.
https://github.com/john77eipe/DeepQA

how to find out what Solr version is DSE using

I am trying to find out what Solr version our DSE setup is using. I know it uses a custom modified solr, but I want to know the index Lucene version.
Apart from opening an index with Luke, is there somewhere where DSE shows this info? I don't see it in the Solr admin overview.
EDIT: I am only counting on looking at the setup, not any doc
Check the release notes:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html
You can also see it in your system.log on startup.
Note: solr and lucene versions are the same now that they are a single project:
https://github.com/apache/lucene-solr/releases
In the solrconfig.xml, there is usually a line such as this:
<luceneMatchVersion>5.3.0</luceneMatchVersion>
This gives you the minimum version of Lucene required.

How to show the contents of files when searching in alfresco

When I am searching for a particular content, it is showing the file which has the content, how can I show the line in which the particular content is there?
I know alfresco uses lucene, can I use lucene highlighter. If yes how to use lucene highlighter in alfresco?
What about solr can I use that?
4.2.e without modifications means that you're using SOLR.
Afaik there is no addon that adds hit-highlighting to Alfresco's Solr search subsystem.
It's on the roadmap.
There are quite some posts regarding hit-lighting in Alfresco based on lucene.
Alfresco 5.2 seems to have this feature. Searched for string is highlighted with context in the search results.

Simple Nutch 1.3/Solr index explanation

After much searching, it doesn't seem like there's any straightforward explanation of how to use Nutch 1.3 with Solr.
I have a Solr index with other content in it that I'll be using on a website for search.
I'd like to add Nutch results to the index, which will add external sites to the website's search.
All of this is working just fine.
The question is, how do you freshen the index? Do you have to delete all of the Nutch results from Solr first? Or does Nutch take care of that? Does Nutch remove results that are no longer valid from the Solr index?
Shell scripts with no documentation or explanation of what they are doing haven't been helpful with answering these questions.
The nutch schema defines id (= url) as teh unique key. If you re-crawl the url teh document will be replaced in solr index when nutch posts the data to solr.
Well you need to implement incremental crawling in Nutch... which is dependent on your application. Some people want to recrawl every day, others every 3 month. The max is 90 days in any case.
The general idea is to delete crawl segments that are older than your max time for recrawl, since they will be redundant at that time. And produce a fresh solrindex for use in Solr.
I'm afraid that you have to do that yourself in scripting. One day I may put on the wiki some scripts I did for that, but they are not ready for publish as it stands.
Try Lucidworks' enterprise Solr for testing/prototyping, which has a webcrawler builtin.
http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise
It'll give you a feel for the whole Lucene stack. It has a MUCH better interface than any other Java software I've ever used. It's a joy to use.

Resources