Solr : Boost Results from a specific collection

Solr : Boost Results from a specific collection - solr

We have solr index which has multiple collections i.e. collection_data_sales and collection_data_marketing. So when the user performs a search query, both the collections are queried upon using collection alias. Both collections have same solr schema.
Is there a way to boost the result from a specific collection ?
i.e. Suppose user specifies collection sales data, then search should happen on both collection_data_sales and collection_data_marketing but boost should be given for documents from collection_data_sales.

If you are able to differentiate both collections using data from it it will be enough. Lets imagine that in schema you have field type so for collection_data_marketing you have type:marketing and for collection_data_sales you have type:sales.
The only thing now you have to do is to use boost function like for example this:
bf=sum(product(query($q1),10), product(query($q2,3)))&q1=type:sales&q2=type:marketing
In this example sales will have weight 10 and marketing will have weight 3

Related

Solr query multiple collections

I have multiple collections with different fields in the schema, I would like to perform a search across multiple collections and perform default rank for results across all the collections .
Example - I have a document with ‘mustang’ word occurring 3 times in collection A and also 2 times in Collection B , then I would like the results to show both the documents with the document from collection A first and document from collection B as second result.

Scoring doesn't only take the number of occurrences into factor, so by default it'll also depend on the number of documents containing that term in the collection as well. If we're talking about a single term, you can sort by the tf function or something like that - for more complex queries, using collection wide term frequencies may be the only option (but may be costly).
To create one common collection that queries both, use the CREATEALIAS command in the Collections API. The collections parameter takes a comma separated list of collections that is represented by the alias, allowing you to query both A and B through the alias C.

Can Solr or ElasticSearch return same results in different orders to different visitors for the same search criteria?

I am developing a Spring-based website and I need to use a search engine to provide "customized" search results. I am considering Solr or Elastic.
Here is what I mean by "customized".
Suppose I have two fields A and B to search against.
Suppose that there are two visitors and I am able to profile them by tracking their activities. Suppose visitor 1 constantly uses or searches for value a (of A) and visitor 2 value b (of B). Now both visitors search for records that satisfy A=a OR B=b.
Can Solr or Elastic return results in different order for visitor 1 and 2? I mean that for example, results with A=a are ahead of only B=b results for visitor 1? And the opposite for visitor 2?
I understand that I need to pass some signal to a search engine to ask the engine to give more "weight" to one of the fields.
Thanks and Regards.

It looks like you just need to give a different weight to the fields you're querying on depending on the user that's executing the query.
You could for example use a multi_match query with elasticsearch, which allows you to search on multiple fields giving them different weights as well. Here is an example that makes the fieldA more important:
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "fieldA^3", "fieldB" ]
}
}
That way the score is influenced by the weights that you put on the query, and if you sort by score (default) you get the results in the expected order. The weights assigned to the fields need some fine-tuning though depending on your documents and the query you execute.

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?

Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.

A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

SOLR: Is it it possible to index multiple timestamp:value pairs per document?

Is it possible in solr to index key-value pairs for a single document, like:
Document ID: 100
2011-05-01,20
2011-08-23,200
2011-08-30,1000
Document ID: 200
2011-04-23,10
2011-04-24,100
and then querying for documents with a specific value aggregation in a specific time range, i.e. "give me documents with sum(value) > 0 between 2011-08-01 and 2011-09-01" would return the document with id 100 in the example data above.

Here is a post from the Solr User Mailing List where a couple of approaches for dealing with fields as key/value pairs are discussed.
1) encode the "id" and the "label" in the field value; facet on it;
require clients to know how to decode. This works really well for simple
things where the the id=>label mappings don't ever change, and are
easy to encode (ie "01234:Chris Hostetter"). This is a horrible approach
when id=>label mappings do change with any frequency.
2) have a seperate type of "metadata" document, one per "thing" that you
are faceting on containing fields for id and the label (and probably a
doc_type field so you can tell it apart from your main docs) then once
you've done your main query and gotten the results back facetied on id,
you can query for those ids to get the corrisponding labels. this works
realy well if the labels ever change (just reindex the corrisponding
metadata document) and has the added bonus that you can store additional
metadata in each of those docs, and in many use cases for presenting an
initial "browse" interface, you can sometimes get away with a cheap
search for all metadata docs (or all metadata docs meeting a certain
criteria) instead of an expensive facet query across all of your main
documents.

Solr: where to store additional information?

I want to provide additional information per each indexed document during index time.
And access this information in the same analyzer during query time to compare it.
So. Theoretically it would be great to write this value into some field present in this document and at query time search this field also.
f.e. I have an animals db. I want to find all documents with 3 words 'dog' inside. (just an example). I can setup for my "animals" field my custom BaseTokenFilterFactory which will produce my custom TokenFilter which will just count all 'dog' words and store this number somewhere. So. Where I can store this value to access it at searching time?

Your example sounds like something which will be better suited to be handled by custom Similarity or a query function in Solr and not as a custom analyzer.
For example if using Solr 4.0 you can use the function termfreq(field,term) to order by the number of times dog appears. or you can use it as a filter like so:
fq={!frange l=3 u=100000}termfreq(animals,"dog")
This will filter all documents whose animals field doesn't have at least 3 occurrences of the word dog.
The advantage of using this method is that you don't affect the scoring of the documents only filter them.
The ability to filter by function exists since Solr 1.4 so even if you are using an earlier version of Solr (>1.4) you can easily write the "termfreq" function query yourself