Document tagging

Document tagging - solr

I have very huge solr index. I want to tag all documents with terms which better represent that document like this. Does this type of clustering results is also come under document tagging?
Which approach is better, Index time Document tagging or Query time document tagging like carrot2 ?

Query time has the obvious drawback that this makes the query more expensive.
However, the clustering results at query time are supposedly better, because at that time, more information has been seen and user feedback can be incorporated.
Note that technically, this is probably more frequent pattern mining than cluster analysis.
Maybe you should just try this variant of frequent pattern mining on your whole data set. You might not even need to store which documents were tagged which way - the solr engine should already be optimized to retrieve them again when needed.

I understood from your question that you want to know how to implement something similar to carrot2 faceting using solr.
IMO you can add a multivalued field tag to your documents (see this Stack Overflow Question for an example) with the cluster names for that doc, and then build facets using that field as explained in Solr wiki here and here.

Related

usage of solr in ecommerce area what to index and what not to index

I work in an e-commerce area dealing with a huge [300,000] no of products and related data like attributes, price, promotions, collections.
We are thinking to index all the information in solr and query from solr instead of any DB hits. We are thinking of going this way to provide good performance, easy sorting, and faceting features [some example]. We also have delta indexing for near real-time accuracy.
Is this a good decision or it should be a mix of solr and DB. Need some suggestions.

It all depends on your requirement. Solr is not a full replacement to your RDBMS.
You can migrate the relevant or important data to solr from RDBMS using the DIH feature of solr.
Whatever searching you want to execute you can perform on the solr server by HTTP requests.
You can also achieve sorting and faceting with solr.
And yes Delta indexing is also possible in solr.
Your decision of migrating the search relevant data is correct in order to improve the search performance of your application.
Yes it should be mix of RDBMS and solr for your application.

Are there any nosql database could do search(like lucene) on map/reduce

I'm using cloudant which I could use mapreduce to project view of data and also it could search document with lucene
But these 2 feature is separate and cannot be used together
Suppose I make a game with userdata like this
{
name: ""
items:[]
}
Each user has item. Then I want to let user find all swords with quality +10. With cloudant I might project type and quality as key and use query key=["sword",10]
But it cannot make query more complex than that like lucene could do. To do lucene I need to normalize all items to be document and reference it with owner
I really wish I could do a lucene search on a key of data projection. I mean, instead of normalization, I could store nested document as I want and use map/reduce to project data inside document so I could search for items directly
PS. If that database has partial update by scripting and inherently has transaction update feature that would be the best

I'd suggest trying out elasticsearch.
Seems like your use case should be covered by the search api
If you need to do more complex analytics elasticsearch supports aggregations.

I am not at all sure that I got the question correctly, but you may want to take a look at riak. It offers a solr-based search, which is quite well documented. I have used it in the past for distributed search over a distributed key-value index and it was quite fast.
If you use this, you will also need to take a look at the syntax of solr queries, so I add it here to save you some time. However, keep in mind that not all of those solr query functionalities were available in riak (at least that was when I used it).

There are several solutions that would do the job. I can give my 2 cents proposing the well established MongoDB. With MongoDB you can create a text-Index on a given field and then do a full text Search as explained here. The feature is in MongoDb since version 2.4 and the syntax is well documented on MongoDB docs.

Is it possible to retrieve Hbase data along with Solr data?

I have the pipeline of Hbase, Lily, Solr and Hue setup for search and visualization. I am able to search on the data indexed in Solr using Hue, except I cannot view all the required data since I do not have all the fields from Hbase stored in Solr. I'm not planning on storing all of the data as well.
So is there a way of retrieving those fields from Hbase along with the Solr response for visualizing the data with Hue?
From what I know, I believe it is possible to setup the Solr searchhandler to perform this, but I haven't been able to find a concrete example to help me understand better(I am very new to both Solr and Hbase, so examples help)
My question is similar to this question. But I am unable to comment there for further information.
Current Solution thanks to suggestion by Romain:
Used HTML widget to provide a link for each record in Hue Search page back to the Hbase record on the Hbase Browser.

One of the approach is, fetch the required id from the solr, and then get the actual data from Hbase. Well solr gives you the count based on your query and also some faceting features. Once those are fetched, and you always have the data in Hbase. Solr is best for index search. So given the speed and space compromise, this design can help. Another main reason is Hbase gives you good fetch times for entire row, when searched based on row key. So, the overall performance depends on your Hbase row key design also.
i think you are using lily Hbase indexer if I am not wrong. so by default the doc id is the hbase row key, which might make things easy

What to be aware of when querying an index with Elasticsearch when indexing with SOLR?

As part of a refactoring project I'm moving our quering end to ElasticSearch. Goal is to refactor the indexing-end to ES as well in the end, but this is pretty involved and the indexing part is running stable so this has less priority.
This leads to a situation where a Lucene index is created / indexed using Solr and queried using Elasticsearch. To my understanding this should be possible since ES and SOlR both create Lucene-compatable indexes.
Just to be sure, besides some housekeeping in ES to point to the correct index, is there any unforseen trouble I should be aware of when doing this?

You are correct, Lucene index is part of elasticsearch index. However, you need to consider that elasticsearch index also contains elasticsearch-specific index metadata, which will have to be recreated. The most tricky part of the metadata is mapping that will have to be precisely matched to Solr schema for all fields that you care about, and it might not be easy for some data types. Moreover, elasticsearch expects to find certain internal fields in the index. For example, it wouldn't be able to function without _uid field indexed and stored for every record.
At the end, even if you will overcome all these hurdles you might end up with fairly brittle solution and you will not be able to take advantage of many advanced elasticsearch features. I would suggest looking into migrating indexing portion first.
Have you seen ElasticSearch Mock Solr Plugin? I think it might help you in the migration process.

Can solr be used to increment/decrement?

I am working on a project right now that has a solr index of counts and ids. I am currently researching if it is possible to increment/decrement on solr directly, instead of having to retrieve the data, increment it with PHP, and then reinsert it into solr.
I have spent an hour googling variations of this to no avail. Any information would be most appreciated.
Thanks.

No, as far as I know it's not possible. You could certainly implement this in Solr as a request handler, which would retrieve the document from the underlying Lucene index, update the field, then write it back to the index and commit, but doing this too frequently will probably kill your performance. This is not really what Lucene/Solr were designed for. Consider using something like Redis instead, for this particular feature, and leave Lucene/Solr for full-text search, where it really shines.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Document tagging - solr

I have very huge solr index. I want to tag all documents with terms which better represent that document like this. Does this type of clustering results is also come under document tagging? Which approach is better, Index time Document tagging or Query time document tagging like carrot2 ?

Related

usage of solr in ecommerce area what to index and what not to index

Are there any nosql database could do search(like lucene) on map/reduce

Is it possible to retrieve Hbase data along with Solr data?

What to be aware of when querying an index with Elasticsearch when indexing with SOLR?

Can solr be used to increment/decrement?

Categories

Resources