Solr sorting function fails with colon in field name - solr

I have a very strange problem with Solr sorting function. When I do a simple sort by field value, it works fine. Here the query that works:
q=ss_type:product_variant
sort=fs_field_master_product:field_price asc
However, when I do a sort by a sum of values it gives an error. The query:
q - the same
sort=sub(fs_field_master_product:field_price,10) asc
The error:
<lst name="error">
<str name="msg">can not sort on a field which is neither indexed nor has doc values: sub(fs_field_master_product:field_price,10)</str>
<int name="code">400</int>
</lst>
I couldn't find much about this kind of issues with sorting functions on Internet, any help is highly welcome!
ps. I was inclined to consider the colon in field's name to be the root of the problem, but Solr didn't fail in any other cases the field is used.

Related

Solr facet behavior

So I'm running in to a problem with facetting in Solr. When I query for something, say sourcecode:WOS, I get what you would expect, like so:
This looks fine.
However, if I now try to FACET on this field instead, I get this:
As you can see, I now have 4 entries for each sourcecode, and all of them have some weird combination of symbols prepended to them:
<int name="C¨4C1">1433755</int>
<int name="C¨4C1">1433755</int>
<int name="¨4c1">1433755</int>
<int name="¨4c1">1433755</int>
It seems like some unicode character are added when I facet for sourcecode, which is a solr.TextField.
Has anyone ever encountered this issue before?
Thanks,
Rasmus Edvardsen

Understading Solr nested queries

I'm trying to understand solr nested queries but I'm having a problem undestading the syntax.
I have the following two indexed documents (among others):
<doc>
<str name="city">Guarulhos</str>
<str name="name">Fulano Silva</str>
</doc>
<doc>
<str name="city">Fortaleza</str>
<str name="name">Fulano Cardoso Silva</str>
</doc>
If I query for q="Fulano Silva"~2&defType=edismax&qf=name&fl=score I have:
<doc>
<float name="score">28.038431</float>
<str name="city">Guarulhos</str>
<str name="name">Fulano Silva</str>
</doc>
<doc>
<float name="score">19.826164</float>
<str name="city">Fortaleza</str>
<str name="name">Fulano Cardoso Silva</str>
</doc>
So I thought that if I queried for:
q="Fulano Silva"~2 AND __query__="{!edismax qf=city}fortaleza" &defType=edismax&qf=name&fl=score
I'd give a bit more score for the second document, but actually I get an empty result set with numFound=0.
What am I doing wrong here?
Need to remove the "=" and replace it with ":" to use the nested query syntax:
q="Fulano Silva"~2 AND _query_:"{!edismax qf=city}fortaleza" &defType=edismax&qf=name&fl=score
*Use _query_: instead of _query_=
Hope this works...
EDIT: When you say q=, are you specifying the query in a URL, or is the text after the q= being put in an application or the Solr dashboard? If we're talking about a URL, you may need to use percent-encoding to get it to work. I mentioned that below, but since I haven't heard from you, I thought I'd reiterate.
Why don't you do q=name:"Fulano Silva" AND city:"fortaleza"?
Another possibility: q=_query_:"{!edismax qf='name'}Fulano Silva" AND city:"fortaleza"
If you're set on a nested query, select?defType=edismax&q="Fulano Silva" AND _query_:"{!edismax qf='city' v='fortaleza'}" should work, but the results and the way it matches will depend on what analyzers you are using to query and index name and city. Also, if these queries are in your query string, make sure you are
encoding them properly.
In order to help you any more, I need to know what you're trying to accomplish with your query. Then perhaps we can be sure you have the right indexing set up, that edismax is the right query handler, etc.
On top of the previous comments, the asker has mispelled _query_ as __query__ (note the double underscore in the second, mispelled, version); Solr expects _query_ to be spelled with only one underscore (_) before and one after the word query, not two.

Using alternative label for Solr 4.0 Multivalue-field

I'm struggeling a little about the label of my facet-fields. I'm using Solr4 and feed my solr-index with the drupal-solr-search-api-modul (http://drupal.org/project/search_api_solr‎).
I use some taxonomy-fields for facets and almost everything is working finde. But I can't change the label of the fields. Maybe say, I have the field
"sm_thisisvocname"
Then the field is in the index like
sm_thisisvocname:name
for the values of the field and
sm_thisisvocname:vocabulary:name
for the label of the (taxonomy)-field like "This Is Vocname".
So the XML looks
<lst name="facet_fields">
<lst name="sm_thisisvocname:name">
<int name="C">2</int>
<int name="B">1</int>
<int name="D">1</int>
<int name="E">1</int>
</lst>
</lst>
AND
<sm_thisisvocname:vocabulary:name>This Is Vocname</sm_thisisvocname:vocabulary:name>
in the xml. I can't I use the query
&facet=true&facet.field=sm_thisisvocname:name
because there colons in the field-names ... Can anybody help me?
you should change your field name to not have the colon : as it is treated as a special character for multiple things in solr query.
Could only find the Documentation:-
Currently a field name must consist of only A-Z, a-z, 0-9, - or _
Field Alias is something that you can check upon, however it too depends on :)
You can also try to escape the : in the field name.

Sort Facets by Index with non-ASCII values

We have a field 'facet_tag' that contains tags describing a product. Since the tags are in german, they may contain non-ASCII characters (like umlauts). Here are some possible values:
"Zelte"
"Tunnelzelte"
"Äxte"
"Sägen"
"Softshells"
Now if we query solr for the facets with a query like:
http://<solr_host>:<solr_port>/solr/select?q=*&facet=on&facet.field=facet_tag&facet.sort=index
The sorted result looks like this:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="facet_tag">
<int name="Softshells">1</int>
<int name="Sägen">1</int>
<int name="Tunnelzelte">1</int>
<int name="Zelte">1</int>
<int name="Äxte">2</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
The tag "Äxte" should be the first item, followed by "Sägen". Obviously Solr does not handle non-ASCII characters well in this case (which is also stated in the documentation for faceted search, see http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort)
Is there any way to let Solr sort these values properly without normalizing umlauts (since we show the values to the user)?
I would use ASCIIFoldingFilterFactory:
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
This way what you index becomes normalized (for example Äxte becomes indexed as Axte), but what is stored doesn't change. That's why you should then get the expected sorting, but the content you'll show will still be the original one (Äxte for example).
UPDATE
The solution doesn't apply to facets since they use the indexed values. Using the ASCIIFoldingFilterFactory you can have the right sort but you'll see normalized character as output as well. Basically you can have the right sort but wrong output or wrong sort but right output. Unfortunately I don't know any other solution.

Apache Solr TermsComponent: How to prevent from splitting words after one character. E.g. "t-shirt"

I'm trying to get autosuggestions for search terms. But I#ve run into a problem with words containing characters like "-" and "&" which are being splitted after just one character.
Example:
/solr/terms/?terms=true&terms.fl=item&terms.limit=10&terms.sort=count&terms.prefix=t
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="terms">
<lst name="item">
<int name="top">11335</int>
<int name="tshirt">10249</int>
<int name="t">10156</int>
<int name="trouser">4771</int>
<int name="tight">1577</int>
</lst>
</lst>
</response>
The problem lies with tshirt and t. "t" only appears within "t-shirt". so how do I prevent Solr from splitting words just after one character if there is no whitespace after it. "t shirt" should split - "t-shirt" and "h&m" should not.
Thanks for your help!
The field type for items seems to be text with WordDelimiterFilterFactory being one of the filters in the analysis.
WordDelimiterFilterFactory by default will split on Intra word delimiters.
So t-shirt would generate two tokens t and shirt, and hence the term t appears for you.
If you want to use terms for autosuggest, remove or tune the WordDelimiterFilterFactory as per the requirement.
You can use the TextField with basic configurations, like with WhitespaceTokenizerFactory and apply the lower, ascii folding filters on it so the tokens are least analyzed and don't appear fragmented.
You can also add words you don't want to be split by adding them to protwords.txt or map certain characters in wdfftypes.txt so they won't be used to split terms.
Also check this link for good AutoSuggester http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
If that's the only problem you have using the TermsComponent to make auto suggestions the answer you got is perfect, but I'd like to propose an alternative answer.
The TermsComponent is fast and pretty easy to use, but it has the following limitations:
you can't apply any filter to your suggestions;
you may have trouble with case-sensitive queries: for example, if you use the LowerCaseFilterFactory and index the word Word, you'll get the suggestion only typing w and not W. You basically need to take care of lowering the query before submitting it to solr, since you can't apply any tokenizer or filter to your query.
Depending on your requirements, you might want to consider different ways to make auto suggestions with Solr. The Different ways to make auto suggestions with Solr article should be useful in order to make the right choice.

Resources