Solr filter on facets - solr

Each of my documents can have one or more entries of a field called Classes, describing some properties of the document, always of the form:
<field name="Classes">"<Description> - <TypeLabel> - <OriginLabel>"</field>
So for instance a document about food might have the two fields:
<field name="Classes">"Yellow orange - Fruit - California"</field>
<field name="Classes">"Small broccoli - Vegetable - Florida"</field>
I am using Solr 5.0 and a schema.xml file, where I have a multiValued "text_en" field Classes that I copy to a "string" field Classes_asString so that I can do faceting on the whole field and treat is as a big label.
With facet.field on Classes_asString I am getting the facet counts that I want, but now I would like to additionally filter these results.
For example, how do I only get facet results that end with "California"?
Or, in another example, how do I only get facet results that have "Vegetable" between the two "-"?
I have seen the option facet.prefix, but this is not applicable in my case. I would appreciate any help or suggestions.

Maybe this scenario is a good place to use:
Index the Classes info as Child documents. You have at least 3 fields in those fields, so it's worth using their own doc for that?
Then you should be able to facet on the specific child field, either with a current Solr version if it is supported (not sure), or with work in this ticket that is not merged yet

Related

Sorting of solr documents based on search term in solr

I would like to sort solr documents based on searched term. For example the search term is "stringABC"
Then the order of the results should be
stringABC,
stringABCxxxx,
xxxxstringABCxxxx
The solr document will contain lot of fileds ex: title, description, path, article No, Product code etc..
And the default field will contain more than one field ex: title, description and path.
So the solr doc will only be returned when the search term satisfied any field from the default field.
Use three fields - one with the exact string, one with a EdgeNgramTokenizer and one with an NgramTokenizer. You can then use qf=field1^10 field2^5 field3 to score hits in these fields according to how you want to prioritize them between each other.

how does Solr store documents

I know Solr uses Lucene and Lucene uses an inverted index. But from the Lucene examples I have seen so far, I am not sure I understand how it woks in combination with Solr.
Given the following document:
<doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
<field name="inStock">true</field>
</doc>
From the examples I have seen so far, I would think that Lucene has to treat each field as a document. it would then say: the ord Cannon appears in field name and field manu.
Is the index broken down this much? Or does the index only say: "the word Canon appears in the document with id such and such"?
How does this work exactly when using Lucene with Solr?
What would this document look like in the index? (supposing each field has indexed="true")
I made a blog post few years ago, to explain that in details[1] .
Short answer to this question :
" From the examples I have seen so far, I would think that Lucene has to treat each field as a document."
Absolutely NOT.
Lucene unit of information is the document which is composed by a map field -> value[s] .
A Solr document is just a slightly different representation as Solr incorporate a schema where fields are described.
So in Solr you can just add fields to the documents without having to describe the type and other properties ( which are stored in the schema), while in Lucene you need to define them explicitly when creating the doc.
[1] https://sease.io/2015/07/exploring-solr-internals-lucene.html

How to query a specific document by id

From a previous query I already have the document ID (the uniqueKey in this schema is 'track_id') of the document I'm interested in.
Then I would like to query a sequence of words on that document while highlighting the match.
I can't seem to be able to combine the search parameters in a successful way (all my google searches return purple links :\ ), although I've already tried many combinations these past few days. I also know the field where the matches will be if that's any use in terms of improving match speed.
I'm guessing it should be something like this:
/select?q=track_id:{key_i_already_have} AND/&/{part_I_dont_know} word1 word2 word3
Currently, since I can't combine these two search parameters, I'm only querying the words and thus getting several results from several documents.
Thanks in advance.
From Solr 4 you can use the realtime get, which is much more faster than searching the index by id.
http://localhost:8983/solr/get?ids=id1,id2,id3
For index updates to be visible (searchable), some kind of commit must reopen a searcher to a new point-in-time view of the index. The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.
You may try applying Filter Query for id. So it will filter your search query to that id, and then search in that document for all the keywords, and highlight them.
Your query will look like:
/select?fq=track_id:DOC_ID&q=word1 word2 word3
Just make sure your "id" field in schema.xml is defined of the type string to apply filter queries on it.
<field name="id" type="string" indexed="true" stored="true" required="true" />

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

Solr - Results that contain all terms, in any order

In a SOLR install, when I search against a field with a multi-word search term I want SOLR to return documents that have all of the terms in the search, but they do not need to be in the exact order.
For example, if I search for title of Brown Chicken Brown Cow, I want to find all documents that contain all of the terms Brown, Chicken and Cow, irrespective of order in the title field. So, for example, the title "The chicken and the cow have brown poop" should match the query. AFAIK, this is how Google executes searches as well.
I have experimented with the following query formats:
1. Title:Brown AND Title:Chicken
2. Title:Brown AND Chicken
3. Title:Brown+Chicken
I am very confused by the results. In some instances, the first two queries return the same exact set of results. In other instances, the first version will return many results and the second version will return none. The third version seems to meet my needs, but I am confused by the different meaning of the queries.
All of my tests have been run against a field of type text_en.
<field name="Title" multiValued="false" type="text_en" indexed="true" stored="true"/>
So, what's the best SOLR query/set up for this type of search? Also, is there an easy way to make Solr.NET take a user entered search term and convert it to this type of format?
Also, will SOLR by default give documents that match the order of the search phrase a higher relevancy score? If not, what's the right levers to pull to make that happen?
Edit:
Some of my confusion was caused by searching against not default fields vs default fields. Knowing this, the only format that works consistently is the first format.
If I were you I would try to use:
Title:(Brown Chicken)
Brackets will make it equivalent to your query no 1. Quotation will force Solr to search for exact match, including space and order
Please try Title:"Brown Chicken" or use Dismax query parser to handle your queries.
The wiki for lucene query parser speaks (emphasis mine):
....Since text is the default field, the field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:Do it right
Will only find "Do" in the title field. It will find "it" and "right"
in the default field (in this case the text field).
Do you have only the title field in your data model?
Please run debugQuery=on to explain your query to see how they are scored: see it in action https://stackoverflow.com/a/9262300/604511

Resources