I just did two things.
q -> iphone
df -> brand
and
q -> brand:iphone
Both returns same result.
First one looks for iphone string in brand field. Second one returns brand field whose value is phone.
What is the purpose of df field?
There really isn't any difference - but to show WHEN it would be different, you'll have to consider the case when you query a different field than the one provided in df.
q=model:foo&df=brand
This would lead to foo being matched against values in the field model, while brand is ignored. If the person writing the query however didn't specify a field, brand would be searched.
Most of the time you'd want to use the edismax or dismax query type (defType=edismax) to be able to create more suitable rules for which fields to query and the weight between the fields, and to handle how most people use a search field:
defType=edismax&q=foo&qf=brand^10 model
.. would search the fields brand and model for foo, and give a tenfold increase in score if the hit is in the brand field compared to the model field. Just q=foo&qf=brand would replicate your first query, and since edismax also supports parts of the lucene syntax, q=brand:foo&qf=model should also work.
Related
i use solr and i have a trouble with result score. For example
i have such docs with one field (for example "content"):
content = car
content = cars
content = carable awesome
content = awful for carable
And i make search query with such params ":{
"mm":"1",
"q":"car",
"tie":"0.1",
"defType":"dismax",
"fl":"*, score",}
i expect to see the result like this:
car: 5 score
cars: 4.8 score
carable awesome: 3
awful for carable: 3
Word without "s" should be highter, but i have strange things. How i can boost absolute match (like a car)
This happens because the field type you're using for the field has a stemming filter (or an ngramfilter) attached (which makes cars and car generate hits against each other). You can't boost "exact hits" inside such a field, since for Lucene they are the same value. What's stored in the index is the same for both car and cars - the latter is processed down to car as well.
To implement this and get exact hits higher, you add a second field without that filter present that only tokenizes (splits) your content on whitespace and lowercases the token. That way you have a field where cars and car are stored as different tokens, and tokens won't contribute to the score if they're not being matched.
You can use qf in Solr to tell Solr which fields you want to search against, and you can give a boost at the same time - so in your case you'd have qf=exact_field^10 text_field where hits in exact_field would be valued ten times higher than hits in the regular field (the exact boost values will depend on your use case and how you want the query profile to behave).
You can also use the different boost arguments (bq and boost) to apply boosts outside of your regular query (i.e. add a query to bq that replicates your original query), but the previous suggestion will probably work just fine.
I am using Solr (with pySolr) to search products in my database, returning products, facets and facet.pivots:
result = solr.search(query_s, **{
'rows': '24',
'sort': formatted_sort,
'facet': 'on',
'facet.limit': '-1',
'facet.mincount': '1',
'facet.field': ['gender', 'material'],
'facet.pivot': 'brand,series',
'fq': '-in_stock:(0 OR 99 OR 100 OR 101)'
})
The query_s selects specific fields, for example: brand:Target AND gender:Men's.
I would like to combine the above query with a DisMax query which will allow me to combine the above query with a full text search over specified fields. I found an article which demonstrates nested queries. I have tried to implement something like this:
q: "gender:* AND _query_:"{!edismax qf=brand series}Summer""
For some reason 'Target' will return results for Target brand shirts, but only with correct capitalization. 'Summer' which is a series of Target, won't return any results. Why am I not seeing a list of docs ordered by relevancy?
Am I overcomplicating things by using Dismax altogether?
The dismax parsers are useful for making sense of more "natural" queries, i.e. queries where the user is used to just type what they're looking for, and how most search engines work.
In your case it sounds like brand:Target AND gender:Men's are filters for which documents should be shown, while the query is the part that the user has typed. Usually you'll want to have the filters in fq as they don't affect score (i.e. they're exact values matching a field value), and the query in q.
I assume that Summer is what the user would have typed into your search box, which would give you:
q=Summer&defType=edismax&qf=series
But this assumes that the series field is defined as a text field that has an analyzer attached, so that the values are lowercased and split appropriately.
If you also have a description field you'd like to search, you can do:
q=Summer&defType=edismax&qf=series^20 description
.. which would search for Summer in both the series and description fields, but give 20 times more weight to a hit in the series field. This is a good way to naturally boost documents that match more exact data in your documents. If you also include the brand field, you'd be able to let your users search for "target summer" and similar queries.
I have a structure where I want to search my documents and filter/rank/set conditions on my parents. Example, a doc is a match because it contains my searched string, but also because its parent contains a certain value.
Using the graph parser and experimenting with the filter is the best way I have noticed doing this. I tried block join child parser first but it wouldn't do it for me.
The problem I am facing now is that I can't seem to get the filter to work in this way:
traversalFilter="(-field:x) OR (field2:y)"
Meaning, if field does not have value x it is ok, if field has value x and field2 has y its also ok. Other cases is filtered away.
But it won't work. Any help is appreciated!
Edit for more information:
I have set up a test core with all my fields stored in a text_general field. Default solrconfig. I have a simple chaining I'm using from parameter as document id. And a to field storing all ids of each documents children. And the graph parser works fine, its just this kind of filter that does not work for me.
I have documents with field with value a or b.
A query like this:
q=*
fq={!graph from=id to=to returnRoot=false traversalFilter="(field:b)" }id:0
This query filters away any document and its children that do not have b as value on field.
q=*
fq={!graph from=id to=to returnRoot=false traversalFilter="(-field:b)" }id:0
Should then work in the opposite. Filter away documents with b as value. But this does not work for some reason.
Edit:
from solrquerysyntax:
https://wiki.apache.org/solr/SolrQuerySyntax
Pure negative queries (all clauses prohibited) are allowed.- inStock:false finds all field values where inStock is not false
Which is why q=* fq=-(field:x) works fine, in returning all documents not containing value x in field.
So why can't I add the same filter in the graph traversal
EDIT3:
I have now started looking on the graph parser and have noticed that when filtering -(-field:x) is the same as +field:x. But +(-field:x) is not the same as -field:x and does not work.
Suppose I want to create a recommendation system to suggest people you should connect with based off of certain attributes that I know about you and attributes I have about other people that are stored in a Solr index. Is it possible to query the index with a list of attributes (along with boosts for each attribute) and have Solr return scored results even if some of my fields return no matches? The way that I understand that Solr works is that if one of your fields doesn't contain a match in any documents found in your index, you get zero results for the entire query (even if other fields in the query matched) - is that right? What I would hope is that I could query the index and get a list of results back in order of a score given based on how many (and which) fields matched to something, even if some fields have no matches, for example:
Say that there are 2 people documents stored in the index as follows (figuratively):
Person 1:
Industry: Manufacturing
City: Oakland
Person 2:
Industry: Manufacturing
City: San Jose
And say that I perform a pseudo-Solr query that basically says "Search for everyone whose industry is equal to manufacturing and whose city is equal to Oakland". What I would like is to receive both results back in the result set, even though one of the "Persons" does not reside in Oakland. I just want that person to come back as a result with a lower score than Person1. Is this possible? What might a solr query look like to handle this? Assume that I have many more than 2 attributes for each person (so saying that I can use "And" and "Or" in my solr query isn't really feasible.. or is it?) Thanks in advance for your helpful input! (PS I'm using Solr 3.6)
You mention using the AND operator, which is likely your problem.
The default behavior of Lucene, and Solr, query syntax is exactly what you are asking for. A query like:
industry:manufacturing city:oakland
Will match either, with scoring preference on those that match both. See the lucene query syntax documentation
You can use the bq parameter (boost query) does not affect matching, but affects the scores only.
http://localhost:8983/solr/persons/select?q=industry:manufacturing&bq=City:Oakland^2
play with the boosting factor at the end to get the correct balance between matching score, and boosting score.
I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.