I have to perform search on several fields for ex.(ProductName,ProductDescription,FeedBackOfProduct etc.).
Currently I have 2 approaches
1. I will copy all these searchable fields into one copy field and perform search on that field.
But problem here is How can I boost a perticular field say suppose only on ProductName.
2.Or I will search by field name and give boost accordingly.
ProductName:"Test"^50.0 ProductDescription:"Easy To Handle"~100^70.0
Please tell me which will the best approach.
Thanks in advance.
With the 2nd option (search by field with boost) you have more control over how documents are scored, as you note you do not have this control with 1st option. Either way is a valid approach and which one you want to use will depend on your use case scenarios.
Related
Not sure if it is a relevant query to post, but want to understand auto-suggestion is suitable option for location based search as I am looking for specific requirement. The requirement is, from a specified geo location, want to search for providers(be it doctor with specialty or hospitals) using auto suggestion.
As part of suggestion, I need to pass geo location with search key, the search key would be a doctor’s name or doctor’s specialty or hospital name or hospital address, the suggester would provide the results on the basis of geo distance in ascending order.
The weightage option would be calculated on the basis of distance by inverse value.
I posted earlier a query here (solr autosuggestion with tokenization), this post is relevant to my earlier query.
Regards
Venkata Madhu
If you want to add more logic to the suggestions that you're going to show is probably a good idea to use normal queries instead of the suggest component.
For instance take a look at this repo is a (bit outdated) example of using a normal solr core to store suggestions and do suggest-like queries. Meaning you can do partial match queries on that index and add the custom scoring logic that you want. Keep in mind that it doesn't need to be a separated core you could just copy data from the fields that you have in a separate field used only for generating the suggestions.
In this case, you'll only need to add/edit the score function used to add your own logic (geodist) or even do a hard sort on the distance.
On the Employees Database Table, I'm using field called SearchTags in that field i'm going to add the employees information like FullName + PassportNo + Nationality + JobTitel est.
And to search for a particular employee i'll search within that field (TagSearch)
What Do you think about this method?
isn't that considered as information duplicate?
from my opinion this method is very easy to code it and straight forward.
So, I'd like to know your opinion before I start using this method :)
I am assuming that you are using SQL to perform the search.
What Do you think about this method?
I don't mean to sound harsh, but I completely disagree with your approach.
isn't that considered as information duplicate?
Of course, which is not at all recommended by database design fundamentals.
Problems you will have to face
What if you want to update one of those individual fields? For example, when the job title changes, how will you handle? You will have to update at two places.
A new requirement down the road will demand you to search only 3 of those fields, not four. What would you do? Create another field with duplicates of the latest 3 target fields?
SQL is simple enough to formulate a query to target multiple fields to search.
I have an application that contains a set of text documents that users can search for. Every user must be able to search based on the text of the documents. What is more, users must be able to define custom tags and associate them to a document. Those tags are used in two ways:
1)Users must be able to search for documents based on specific tag ids.
2)There must be facets available for the tags.
My solution was adding a Mutivalued field in each document to pose as an array that contains the tagids that this document has been tagged with. So far so good. I was able to perform queries based on text and tagids ( for example text:hi AND tagIds:56 ).
My question is, would that solution work in production mode in an environment that users add but also remove tags from the documents ? Remember , I have to have the data available in real time, so whenever a user removes/adds a tag I have to reindex that document and commit immediately. If that's not a good solution, what would be an alternative ?
Stackoverflow uses Solr - this is in case if you doubt Solr abilities in production mode.
And although I couldn't find much information on how they have implemented tags, I don't think your approach sounds wrong. Yes, tagged documents will have to be reindexed (that means a slight delay) but other than that I don't see anything wrong with it.
Edit
Can Solr do fuzzy field collapsing? IE collapsing fields that have similar values, rather than identical ones?
I'd assumed that it could, but now I'm not sure, which makes my original question below invalid.
Original Question
For a large given set of values I need to decide which is the most prevalent. The set of all values will change over time, and so I can expect that the output may change over time too.
I gather Solr can do "field collapsing" to group results by a given field, with a tolerance of similarity. Would it be possible, neigh even appropriate, to use Solr solely to collapse fields, to derive the most common value? We use Solr in other parts of the business, and it would be good to leverage existing code rather than home-brewing a custom solution.
No, solr does not support fuzzy collapsing. (at least not based on what is documented on the wiki)
Solr 4.0 supports group.func which allows you to group results based on the result of a FunctionQuery, so it's possible that at some point in time a function could be created to get you approximately what you want, but none of the existing functions will do what you want.
However, Solr does support result clustering, which will maybe work for your use-case. Clustering is done with Carrot2. If you limit the fields used by carrot to a single field, you may get a similar result to "fuzzy clustering", but you have far less control over what carrot does than you do with field collapsing.
For a normal document you might want all your fields analyzed by carrot, e.g.:
carrot.title=my_title&carrot.snippet=my_title,my_description
But if you have, for example, a manufacturer field with slight variations of spelling or punctuation, it might work to only give carrot a single field for both title and snippet:
carrot.title=manufacturer&carrot.snippet=manufacturer
I've set up my first 'installation' of Solr, where each index (document) represents a musical work (with properties like number (int), title (string), version (string), composers (string) and keywords (string)). I've set the field 'title' as the default search field.
However, what do I do when I would like to do a query on all fields? I'd like to give users the opportunity to search in all fields, and as far as I've understood there is at least two options for this:
(1) Specify which fields the query should be made against.
(2) Set up the Solr configuration with copyfields, so that values added to each of the fields will be copied to a 'catch-all'-like field which can be used for searching. However, in this case, i am uncertain how things would turn out when i take into consideration that the data types are not all the same for the various fields (the various fields will to a lesser og greater degree go through filters, but as copyfield values are taken from their original fields before the values have been run through their original fields' filters, i would have to apply one single filter to all values on the copyfield. This, again, would result in integers being 'filtered' just as strings would).
Is this a case where i should use copyfields? At first glance, it seems a bit more 'flexible' to rather just search on all fields. However, maybe there's a cost?
All feedback appreciated! Thanks!
When doing a copy field, the data within the destination field will be indexed using the analyzer defined for that field. So if you define the destination field to be textual data, it is best to only copy textual data in it. So yes, copying an integer in the same field probably does not make sense. But do you really want the user to be able to search for your "number" field in a default search? It makes sense for the title, the composer and the keyword, but maybe not for the integer field that probably represents id in your database.
Another option to query on all fields is to use Dismax. You can specify exactly which fields you want to query, but also defined specific boots for each of them. You can also defined a default sort, add extra boost for more recent documents and many other fancy stuff.