I try to use stream functionality of Solr,
http://127.0.1.1:8983/solr/ContentArticles/stream?expr=search(ContentArticles,qt=%22/export%22,fl=Title,sort=Title%20asc,q=%22Title:Iron*%22)
but I get the following error:
{
"result-set":{
"docs":[{
"EXCEPTION":"java.io.IOException: invalid expression - zkHost not found for collection 'ContentArticles'",
"EOF":true,
"RESPONSE_TIME":0}]}}
The Reference Manual, zkHost is optional. I do not use zookeeper as it is a standalone Solr collection.
Where do I do wrong?
the stream functionality of SOLR is only available for cloud mode.
From the documentation:
"Streaming Expressions provide a simple yet powerful stream processing language for Solr Cloud."
From:
https://solr.apache.org/guide/6_6/streaming-expressions.html
Related
I am trying to inspect solr indexes in DSE with luke. But getting the following error.
Invalid directory at the location,
check console for more information. Last exception:
java.lang.IllegalArgumentException:
An SPI class of type org.apache.lucene.codecs.Codec with name
'dse460' does not exist.
You need to add the corresponding JAR file supporting this SPI
to your classpath.
The current classpath supports the following names:
[Lucene40, Lucene3x, Lucene41, Lucene42, Lucene45,
Lucene46, Lucene49, Lucene410, SimpleText, Appending]
Has anyone used luke with datastax solr indexes ?
As I know, currently it's not possible to do with luke itself...
But you can inspect indices if you enable LukeRequestHandler in DSE Search's solrconfig.xml, like this
<requestHandler name="/admin/luke" class="solr.admin.LukeRequestHandler" />
After that you'll able to look inside index by accessing Solr web interface:
http://<server-ip>:8983/solr/<keyspace.table>/admin/luke
P.S. See DSE Support article for more information about its usage.
In my Solr configuration files I have defined a DataImportHandler that fetches data from a Mysql database and also processes contents of PDF files that are related with registers of the SQL database. The data import works fine.
I'm trying to detect the language of text contained in the files during the data import phase. I have specified in my solrconfig.xml a TikaLanguageIdentifierUpdateProcessorFactory as explained in https://wiki.apache.org/solr/LanguageDetection and have defined in my document schema the language fields, nevertheless, after I run the indexation from the Solr admin, I cannot see any language field on my documents.
In all the examples I have seen, language detection is done by posting a document to solr with the post command, is it possible to do language detection with a DataImportHandler?
Once you have defined the UpdateRequestProcessor chain, you need to actually specify it in the request handler (DataImportHandler's in this case). You do that with update.chain parameter.
Also, ensure that you include LogUpdate and RunUpdate processors, otherwise you are not even indexing at all.
I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl. However, I could not do that for my eariler version of Solr 4.10.1.
Would like to check, is this function available for the earlier version of Solr, and is the curl syntax the same as Solr 5.0?
According to Solr Wiki, it's possible to request schema from Solr 4.2 and modify it starting from Solr 4.4
In order to enable schema modifications via the Schema REST API, the
schema implementation must be declared as managed by Solr, that is,
not to be manually edited.
Further, the schema must be configured as mutable in order to make
modifications to it.
Both of these schema features (managed and mutable) are configured via
the element in solrconfig.xml.
More information - https://wiki.apache.org/solr/SchemaRESTAPI
I have a doubt regarding solr document update. For example, when two requests to update a document in solr comes at the same time, How does solr work?
Does it take one request randomly and locks write before next request comes in?
Thanks in Advance
There are different Locking mechanisms as mentioned in Lucene locking factory docs. By default NativeFSLockFactory is used in which file lock is acquired for the document that is being indexed. The settings for using a different locking mechanism can be changed in solrconfig.xml
Here is a snippet from solconfig.xml
<!-- LockFactory
This option specifies which Lucene LockFactory implementation
to use.
single = SingleInstanceLockFactory - suggested for a
read-only index or when there is no possibility of
another process trying to modify the index.
native = NativeFSLockFactory - uses OS native file locking.
Do not use when multiple solr webapps in the same
JVM are attempting to share a single index.
simple = SimpleFSLockFactory - uses a plain file for locking
Defaults: 'native' is default for Solr3.6 and later, otherwise
'simple' is the default
More details on the nuances of each LockFactory...
http://wiki.apache.org/lucene-java/AvailableLockFactories
-->
<lockType>${solr.lock.type:native}</lockType>
Are you talking about physical locks or logical version control? For logical version control, Solr 4+ supports optimistic concurrency using version field.
You can read about it:
Official documentation
Detailed writeup
I want to index a large number of pdf documents.
I have found a reference showing that it could be done using Apache Tika but unfortunately I cannot find any reference that describes I could configure Apache Tika in Solr 1.4.1.
Once configured I do have it configured, how can I send documents to Solr directly without using curl?
I am using solrnet for indexing.
See ExtractingRequestHandler
Support for ExtractingRequestHandler in SolrNet is not yet complete. You can either finish implementing it, or work around it and craft your own HttpWebRequests.