How to get a human readable value of facet in solr - solr

When I search SOLR with query (grouptype is my field name)
/select?q=grouptype:*&wt=json
I get group values in human readable format ex: Type 1, Type 2 etc.
But when I do a faceted search
/select?q=*:*&rows=0&facet=on&facet.field=grouptype&wt=json
I get facet values like
"type1",9226
"type2",7668
How can I get facet values in human readable format like I got from the earlier query?

Related

Match all documents excluding some terms using full Lucene syntax

Our service's default search web page uses the * full Lucene query to match all documents. This is before the user has provided any search terms. There is some data (test data, in our case) that we want to exclude from the search result.
Is it possible to match all documents but exclude a subset of all documents?
For example, suppose we have an "owners" field and we want to exclude documents with the "testA" and "testB" owner. The following query does not seem to work with the match all approach:
Query: search=* -owners:testA -owners:testB&queryType=full&$orderby=created desc
Error: "Failed to parse query string. See https://aka.ms/azure-search-full-query for supported syntax."
When searching for anything but *, this approach works fine. For example:
Query: search=foo -owners:testA -owners:testB&queryType=full&$orderby=created desc
Result: (many documents matched)
I have considered a $filter for this and using $filter=filterableOwners/all(p: p ne 'testa' and p ne 'testb') but this has the following drawbacks:
the index must be rebuild with a filterable field
analyzers can't be used so case-insensitivity must be implemented by lowercasing the values and filter expression
Ideally this could be done using only the search query parameter with a Lucene query text.
I found a workaround for the issue. If you have a field in your documents that always has a value, you can use a .* regex to match all values in the field and therefore match all documents.
For example, suppose the packageId field has a value for all documents.
Incorrect (as posted in the original question):
Query: search=* -owners:testA -owners:testB&queryType=full&$orderby=created desc
Correct:
Query: search=packageId:/.*/ -owners:testA -owners:testB&queryType=full&$orderby=created desc

Solr - Nested Edismax Query

I am using Solr (with pySolr) to search products in my database, returning products, facets and facet.pivots:
result = solr.search(query_s, **{
'rows': '24',
'sort': formatted_sort,
'facet': 'on',
'facet.limit': '-1',
'facet.mincount': '1',
'facet.field': ['gender', 'material'],
'facet.pivot': 'brand,series',
'fq': '-in_stock:(0 OR 99 OR 100 OR 101)'
})
The query_s selects specific fields, for example: brand:Target AND gender:Men's.
I would like to combine the above query with a DisMax query which will allow me to combine the above query with a full text search over specified fields. I found an article which demonstrates nested queries. I have tried to implement something like this:
q: "gender:* AND _query_:"{!edismax qf=brand series}Summer""
For some reason 'Target' will return results for Target brand shirts, but only with correct capitalization. 'Summer' which is a series of Target, won't return any results. Why am I not seeing a list of docs ordered by relevancy?
Am I overcomplicating things by using Dismax altogether?
The dismax parsers are useful for making sense of more "natural" queries, i.e. queries where the user is used to just type what they're looking for, and how most search engines work.
In your case it sounds like brand:Target AND gender:Men's are filters for which documents should be shown, while the query is the part that the user has typed. Usually you'll want to have the filters in fq as they don't affect score (i.e. they're exact values matching a field value), and the query in q.
I assume that Summer is what the user would have typed into your search box, which would give you:
q=Summer&defType=edismax&qf=series
But this assumes that the series field is defined as a text field that has an analyzer attached, so that the values are lowercased and split appropriately.
If you also have a description field you'd like to search, you can do:
q=Summer&defType=edismax&qf=series^20 description
.. which would search for Summer in both the series and description fields, but give 20 times more weight to a hit in the series field. This is a good way to naturally boost documents that match more exact data in your documents. If you also include the brand field, you'd be able to let your users search for "target summer" and similar queries.

Solr Boosting specific field values

I'm trying to boost the score for documents returned from a search in solr.
The boost I want to achieve is something along the lines of:
field1:(value1)^5 OR field2:(value2)^2
If the document does have field1 matching value1, boost by 5.
If document does have field2 matching value2, boost by 2.
The documents have many fields, let's call them field1, field2... and may be missing certain fields.
The documents do not need to have field1 or field2 matching value1, value2 respectively.
I have other filter queries such as:
fq: field1:[* TO *] <- checking for presence of
fq: field3: ("something" "somethingelse")
fq: field4: 1
I am grouping my results by a certain field not being used in any of the queries.
Raw query parameters:
group=true&group.facet=true&group.field=anIndependentField
I am using the same fq's with tried different query parsers.
There are enough documents in solr with field1:value1 and/or field2:value2 as well as other values for those fields.
So far I've tried using the query parsers:
Standard Query Parser
method a) q: field1:(value1)^5 OR field2:(value2)^2 // no results
method b) q: *:* OR field1:(value1)^5 OR field2:(value2)^2 // no results
method c) q: (value1)^5 OR (value2)^2 // incorrect. looks for complete match.
method d) q: (value1)^5 (value2)^2 // incorrect. looks for complete match
EDisMax Query Parser
(defType=edismax)
q: *:*
bq: field1:(value1)^5 OR field2:(value2)^2
Problem with this one is that results are not in expected order.
A document that has field1:somethingElse and field2:somethingElse2 got a higher score than a document that has field1: somethingElse and field2:value2.
Can anyone see what I'm doing wrong or has a suggestion to improve the relevancy of my search queries?
You can use the bf parameter of eDismax queryParser in the following way:
bf=if(termfreq(field1,"value1"),5,if(termfreq(field2,"value2"),2,1))
Please find below the complete query.
https://<MY_SERVER_NAME>:9443/solr/<MY_COLLECTION>/select?q=*%3A*&wt=json&indent=true&defType=edismax&bf=if(termfreq(field1%2C%22value1%22)%2C3%2Cif(termfreq(field2%2C%22value2%22)%2C2%2C0))

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

Score terms by field type in a document

I'm indexing several different fields in a document using Apache SOLR 3.6.
When I do a search for a term, SOLR returns all the occurrences of the term in each field. However, the same score for all the fields that the term occurred inside the text of the field does not change. For example if USC occurred in the title field, and in the contents field, they both get the same score.
Is there a way to index a document of different fields and have a weighted score based on the type of field within the document?
use dismax or edismax and set the qf (query field) parameter to something like this to give the title more weight than the body.
qf=title^3 body

Resources