I am working on a project right now that has a solr index of counts and ids. I am currently researching if it is possible to increment/decrement on solr directly, instead of having to retrieve the data, increment it with PHP, and then reinsert it into solr.
I have spent an hour googling variations of this to no avail. Any information would be most appreciated.
Thanks.
No, as far as I know it's not possible. You could certainly implement this in Solr as a request handler, which would retrieve the document from the underlying Lucene index, update the field, then write it back to the index and commit, but doing this too frequently will probably kill your performance. This is not really what Lucene/Solr were designed for. Consider using something like Redis instead, for this particular feature, and leave Lucene/Solr for full-text search, where it really shines.
Related
For statistic purposes I have to save and analyse all search queries made to a server running Solr (version 8.3.1). Maybe it's just because I haven't worked with Solr until today, but I couldn't find a simpler way to access these queries except crawling the logs.
I've only found one article to help me, in which the following is stated:
I think that solr by him self doesn't store the queries (correct me if I'm wrong, about this) but you can accomplish what you want by processing the solr log (its the only way I think).
(source: https://lucene.472066.n3.nabble.com/is-it-possible-to-save-the-search-query-td4018925.html)
Is there any more convenient way to do this?
I actually found a good way to achieve this in another SO-Question. Well, at least kind of.
Note: It is only useful if you have enough resources on the same server or another server to properly handle a second Solr-Core.
Link to original answer
That SO-Question is about Elastic-Search but the methodology of it can also be applied to this case with a second Solr-Core that indexes the queries made. (One can also add additional fields like when it was last searched, total search count, ...)
The functionalities of a search auto-complete are also achievable with this solution.
In short:
The basic idea is to use a second Solr instance to provide the means necessary for quickly saving the queries (instead of a DB for instance).
Remark: I'm not going to accept this as the best answer because it is a rather special solution to the question I originally made. But I nonetheless felt it could be useful for any programmer trying to achieve this while also thinking about search auto-completion.
I have a solr core with 100K-1000k documents.
I have a scenario where I need to add or set a field value on most document.
Doing it through Solr takes too much time.
I was wondering if there is a way to do such task with Lucene library and access the Solr index directly (with less overhead).
If needed, I can shutdown the core, run my code and reload the core afterwards (hoping it will take less time than doing it with Solr).
It will be great to hear if someone already done such a thing and what are the major pitfalls in the way.
Similar problem has been discussed multiple times in Lucene Java mailing list. The underlying problem is that you can not update document in Lucene (and hence Solr).
Instead, you need to delete the document and insert a new one. This obviously adds overhead of analyzing, merging index segments, etc. Yet, the specified amount of documents isn't something major and should not take days (have you tried updating Solr with multiple threads?).
You can of course try doing this via Lucene and see if this makes any difference, but you need to be absolutely sure you will be using the same analyzers as Solr does.
I have a scenario where I need to add or set a field value on most document.
If you have to do it often, maybe you need to look at things like ExternalFileField. There are limitations, but it may be better than hacking around Solr's infrastructure by going directly to Lucene.
I have very huge solr index. I want to tag all documents with terms which better represent that document like this. Does this type of clustering results is also come under document tagging?
Which approach is better, Index time Document tagging or Query time document tagging like carrot2 ?
Query time has the obvious drawback that this makes the query more expensive.
However, the clustering results at query time are supposedly better, because at that time, more information has been seen and user feedback can be incorporated.
Note that technically, this is probably more frequent pattern mining than cluster analysis.
Maybe you should just try this variant of frequent pattern mining on your whole data set. You might not even need to store which documents were tagged which way - the solr engine should already be optimized to retrieve them again when needed.
I understood from your question that you want to know how to implement something similar to carrot2 faceting using solr.
IMO you can add a multivalued field tag to your documents (see this Stack Overflow Question for an example) with the cluster names for that doc, and then build facets using that field as explained in Solr wiki here and here.
As part of a refactoring project I'm moving our quering end to ElasticSearch. Goal is to refactor the indexing-end to ES as well in the end, but this is pretty involved and the indexing part is running stable so this has less priority.
This leads to a situation where a Lucene index is created / indexed using Solr and queried using Elasticsearch. To my understanding this should be possible since ES and SOlR both create Lucene-compatable indexes.
Just to be sure, besides some housekeeping in ES to point to the correct index, is there any unforseen trouble I should be aware of when doing this?
You are correct, Lucene index is part of elasticsearch index. However, you need to consider that elasticsearch index also contains elasticsearch-specific index metadata, which will have to be recreated. The most tricky part of the metadata is mapping that will have to be precisely matched to Solr schema for all fields that you care about, and it might not be easy for some data types. Moreover, elasticsearch expects to find certain internal fields in the index. For example, it wouldn't be able to function without _uid field indexed and stored for every record.
At the end, even if you will overcome all these hurdles you might end up with fairly brittle solution and you will not be able to take advantage of many advanced elasticsearch features. I would suggest looking into migrating indexing portion first.
Have you seen ElasticSearch Mock Solr Plugin? I think it might help you in the migration process.
I would like to be able to search a CouchDB database using Solr. Are there any projects that provide such an integration?
I am also aware of CouchDB-Lucene. Is there a way to hook Solr into that?
Thanks!
It would make more sense to roll your own, given how wasy it easy. First you need to decide what kind of SOLR schema to use and how to map your CouchDB documents onto that schema. Then simple iterate through all the documents in a db Pagination in CouchDB? and generate SOLR <add> documents.
People do this all the time with all kinds of data sources. Since SOLR is essentially searching a single table, the hard work is often figuring out how to map your database format onto a single table. Read up on what you can do with the SOLR schema, and you may be surprised at how easy this is.
There is a CouchDB integration for ElasticSearch available, apart from feeding ElasticSearch with JSON on your own. Both work with schema-less JSON, so it's very easy to integrate them.
In terms of features, ElasticSearch would offer a comparable set to Solr (in addition to some unique features, of course.)
According to this
http://wiki.apache.org/couchdb/Related_Projects
there was a CouchDB-Solr2 project (scroll down to the end), which is no longer maintained.