luke with datastax solr - solr

I am trying to inspect solr indexes in DSE with luke. But getting the following error.
Invalid directory at the location,
check console for more information. Last exception:
java.lang.IllegalArgumentException:
An SPI class of type org.apache.lucene.codecs.Codec with name
'dse460' does not exist.
You need to add the corresponding JAR file supporting this SPI
to your classpath.
The current classpath supports the following names:
[Lucene40, Lucene3x, Lucene41, Lucene42, Lucene45,
Lucene46, Lucene49, Lucene410, SimpleText, Appending]
Has anyone used luke with datastax solr indexes ?

As I know, currently it's not possible to do with luke itself...
But you can inspect indices if you enable LukeRequestHandler in DSE Search's solrconfig.xml, like this
<requestHandler name="/admin/luke" class="solr.admin.LukeRequestHandler" />
After that you'll able to look inside index by accessing Solr web interface:
http://<server-ip>:8983/solr/<keyspace.table>/admin/luke
P.S. See DSE Support article for more information about its usage.

Related

Solr DynamicFields map to Sitecore Languages

Migrating from Lucene to Solr for various reasons. There is limited knowledge of Solr on my part at this time. Currently the details of the implementation are Sitecore 8.1 (Update 2) and Solr 4.10.0, per the compatibility table here.
First, the schema.xml was updated following Solution 1 here again. After running "Generate the Solr Schema.xml file" in Sitecore's control panel, the schema.xml file added a list of dynamicField elements corresponding to language codes. The initial assumption was that all the languages added to Sitecore would've been mapped, but this appears to not be the case. It appears to be more inline with what Solr supports in the base instance. It is known that one can add language codes to the schema.xml, though it seems to be tedious to add manually if the Sitecore instance has a large number of languages.
The primary concern is how the languages are being mapped from Sitecore to Solr. There are several examples of language codes needed for dynamicField elements that don't line up with Sitecore languages are even the query string that the log shows in the error message. A couple examples of the issue are shown:
org.apache.solr.common.SolrException: ERROR: [doc=sitecore://master/{234456d1-1dcd-4b53-8b63-588d8b948a69}?lang=en-no&ver=1&ndx=sitecore_master_index] unknown field 'extension_t_nn'
org.apache.solr.common.SolrException: ERROR: [doc=sitecore://master/{ed3796b0-bb9f-44a4-801f-1c26ae7ca6c4}?lang=en-cn&ver=1&ndx=sitecore_master_index] unknown field 'height_t_zh'
It is unknown how en-no resolves to nn, or how en-cn resolves to zh. Understanding this would be ideal before simply adding these language codes to the schema.xml.

how to find out what Solr version is DSE using

I am trying to find out what Solr version our DSE setup is using. I know it uses a custom modified solr, but I want to know the index Lucene version.
Apart from opening an index with Luke, is there somewhere where DSE shows this info? I don't see it in the Solr admin overview.
EDIT: I am only counting on looking at the setup, not any doc
Check the release notes:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html
You can also see it in your system.log on startup.
Note: solr and lucene versions are the same now that they are a single project:
https://github.com/apache/lucene-solr/releases
In the solrconfig.xml, there is usually a line such as this:
<luceneMatchVersion>5.3.0</luceneMatchVersion>
This gives you the minimum version of Lucene required.

Sitecore Solr - IndexSchema no uniqueKey specified in schema

I have a small extra question that I believe is related to
Missing Id field in Solr index a little bit.
The issue is that search result contain duplication items (that was edited), amount of item depends on edit count.
It is seems like sitecore doesn't remove old item from Solr index (no item versions).
Is it Sitecore issue or some specific Solr behavior ?
I see in Solr log next message may be it is connected:
WARN null IndexSchema no uniqueKey specified in schema.
There should be a <uniqueKey> tag in your `schema.xml' file in every Solr core:
<uniqueKey>_uniqueid</uniqueKey>
It should be directly under the root <schema> tag (not inside <fields> or any other tag).
If you follow the guide for enabling Solr with Sitecore, it should be included in your schema.xml automatically.

solrcloud plugin deployment

I am going to migrate our app from Lucene 4.7 to Solr 4.7 (in cloud).
As we have some custom analysers I am interested in how complicated is deployment process with Solr Cloud.
Exactly how does it looks like with custom analysers.
I couldn't find any specific information, can anybody help me?
Regards
If you have the jars present on the hosts and inside the solr jvm classpath (or specified in the solr config as presented here : http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml) then it should not be harder than specifying the field definitions along with your custom analyzers inside the schema file

How to configure Apache Tika with apache Solr 1.4.1

I want to index a large number of pdf documents.
I have found a reference showing that it could be done using Apache Tika but unfortunately I cannot find any reference that describes I could configure Apache Tika in Solr 1.4.1.
Once configured I do have it configured, how can I send documents to Solr directly without using curl?
I am using solrnet for indexing.
See ExtractingRequestHandler
Support for ExtractingRequestHandler in SolrNet is not yet complete. You can either finish implementing it, or work around it and craft your own HttpWebRequests.

Resources