Solr query- How to remove all special-characters logical interpretation? - solr

This is slightly different than remove all special character from input String. I just want them as-it-is. For example:
Query: Haha AND LaLa would be tokenized as
1. Haha
2. And
3. LaLa
not
1. Haha
2. LaLa
and with the same logic, a single input of special character such as:
`! , . : ; OR NOT < >`
would be...themselves (Not logical constructor and the query would return result if in database there are names like that exist)

You can use field or raw query parser they search for the query string that you have typed without any analysis or transformation.
q = {!field f='title'} Haha AND LaLa
[Debug-Response]
<lst name="debug">
<str name="rawquerystring">{!field f='title'}Haha AND LaLa</str>
<str name="querystring">{!field f='title'}Haha AND LaLa</str>
<str name="parsedquery">title:Haha AND LaLa</str>
<str name="parsedquery_toString">title:Haha AND LaLa</str>
<str name="QParser"/>
</lst>
As you can see from the response Solr Looking for the document which contains "Haha AND LaLa". Field query parser treats "AND" as search keyword not as Logical operator.

Related

search for literal hash character in solr query

I cannot find any documentation on the solr website that indicates how to search for a string that contains a literal hash character inside it.
example:
?q=id_number:723#52
I've tried escaping the hash, 723\#52, and HTML encoding it, 723%2352, but the solr output shows that it cuts off at the hash symbol each time:
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">2</int>
<lst name="params">
<str name="q">id_number:723</str>
</lst>
</lst>
Because solr will tokenize the query using class solr.StandardTokenizer so # character will removed from query. you can change the tokenizer for field type definition.
In your case for field id_number change the filter class from solr.StandardTokenizer to solr.WhiteSpaceTokenizer
But doing this method will accept all other special character in the query (.:,etc)

Understading Solr nested queries

I'm trying to understand solr nested queries but I'm having a problem undestading the syntax.
I have the following two indexed documents (among others):
<doc>
<str name="city">Guarulhos</str>
<str name="name">Fulano Silva</str>
</doc>
<doc>
<str name="city">Fortaleza</str>
<str name="name">Fulano Cardoso Silva</str>
</doc>
If I query for q="Fulano Silva"~2&defType=edismax&qf=name&fl=score I have:
<doc>
<float name="score">28.038431</float>
<str name="city">Guarulhos</str>
<str name="name">Fulano Silva</str>
</doc>
<doc>
<float name="score">19.826164</float>
<str name="city">Fortaleza</str>
<str name="name">Fulano Cardoso Silva</str>
</doc>
So I thought that if I queried for:
q="Fulano Silva"~2 AND __query__="{!edismax qf=city}fortaleza" &defType=edismax&qf=name&fl=score
I'd give a bit more score for the second document, but actually I get an empty result set with numFound=0.
What am I doing wrong here?
Need to remove the "=" and replace it with ":" to use the nested query syntax:
q="Fulano Silva"~2 AND _query_:"{!edismax qf=city}fortaleza" &defType=edismax&qf=name&fl=score
*Use _query_: instead of _query_=
Hope this works...
EDIT: When you say q=, are you specifying the query in a URL, or is the text after the q= being put in an application or the Solr dashboard? If we're talking about a URL, you may need to use percent-encoding to get it to work. I mentioned that below, but since I haven't heard from you, I thought I'd reiterate.
Why don't you do q=name:"Fulano Silva" AND city:"fortaleza"?
Another possibility: q=_query_:"{!edismax qf='name'}Fulano Silva" AND city:"fortaleza"
If you're set on a nested query, select?defType=edismax&q="Fulano Silva" AND _query_:"{!edismax qf='city' v='fortaleza'}" should work, but the results and the way it matches will depend on what analyzers you are using to query and index name and city. Also, if these queries are in your query string, make sure you are
encoding them properly.
In order to help you any more, I need to know what you're trying to accomplish with your query. Then perhaps we can be sure you have the right indexing set up, that edismax is the right query handler, etc.
On top of the previous comments, the asker has mispelled _query_ as __query__ (note the double underscore in the second, mispelled, version); Solr expects _query_ to be spelled with only one underscore (_) before and one after the word query, not two.

Using alternative label for Solr 4.0 Multivalue-field

I'm struggeling a little about the label of my facet-fields. I'm using Solr4 and feed my solr-index with the drupal-solr-search-api-modul (http://drupal.org/project/search_api_solr‎).
I use some taxonomy-fields for facets and almost everything is working finde. But I can't change the label of the fields. Maybe say, I have the field
"sm_thisisvocname"
Then the field is in the index like
sm_thisisvocname:name
for the values of the field and
sm_thisisvocname:vocabulary:name
for the label of the (taxonomy)-field like "This Is Vocname".
So the XML looks
<lst name="facet_fields">
<lst name="sm_thisisvocname:name">
<int name="C">2</int>
<int name="B">1</int>
<int name="D">1</int>
<int name="E">1</int>
</lst>
</lst>
AND
<sm_thisisvocname:vocabulary:name>This Is Vocname</sm_thisisvocname:vocabulary:name>
in the xml. I can't I use the query
&facet=true&facet.field=sm_thisisvocname:name
because there colons in the field-names ... Can anybody help me?
you should change your field name to not have the colon : as it is treated as a special character for multiple things in solr query.
Could only find the Documentation:-
Currently a field name must consist of only A-Z, a-z, 0-9, - or _
Field Alias is something that you can check upon, however it too depends on :)
You can also try to escape the : in the field name.

Replacing SOLR output field value

I have below mentioned SOLR query which works fine.
query:"COMPLEX CONDITION 1" OR query:"COMPLEX CONDITION 2"
I get 4 documents in result - 2 from condition1 and 2 from condition2. I need to know documents belong to which condition.
I cannot figure out from the result as the conditions are too complex.
What i want to do is change the value of the "status" field in the output.
Lets say, status=Active for condition1 and status=Expired for condition2.
The current value of status is not accurate as the status is decided based on the conditions i use.
Is there a way to overwrite the output value of any field(s) in SOLR?
have you tried using highlighting to determine which documents matched which condition? If you turn on highlighting (&hl=on&hl.fl=<fields_you're_trying_to_match>), then Solr will return a structure at the end of the results structure (whether you're returning results in JSON or XML) called "highlighting." This structure in turn will contain structures named according to the unique key of your index (if there is one) with elements that match.
<lst name="highlighting">
<lst name="1">
<arr name="title">
<str>Bob <em>Jones</em></str>
</arr>
<arr name="category">
<str><em>Jones</em> Family</str>
</arr>
<arr name="description">
<str>This is a book about Bob <em>Jones</em>, the patriarch of the <em>Jones</em> Family.</str>
</arr>
<lst>
<lst>
More here:
How to return column that matched the query in Solr..?
Now I apologize that this doesn't answer the latter part of your question, but gives you some help for the first part.

Sort Facets by Index with non-ASCII values

We have a field 'facet_tag' that contains tags describing a product. Since the tags are in german, they may contain non-ASCII characters (like umlauts). Here are some possible values:
"Zelte"
"Tunnelzelte"
"Äxte"
"Sägen"
"Softshells"
Now if we query solr for the facets with a query like:
http://<solr_host>:<solr_port>/solr/select?q=*&facet=on&facet.field=facet_tag&facet.sort=index
The sorted result looks like this:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="facet_tag">
<int name="Softshells">1</int>
<int name="Sägen">1</int>
<int name="Tunnelzelte">1</int>
<int name="Zelte">1</int>
<int name="Äxte">2</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
The tag "Äxte" should be the first item, followed by "Sägen". Obviously Solr does not handle non-ASCII characters well in this case (which is also stated in the documentation for faceted search, see http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort)
Is there any way to let Solr sort these values properly without normalizing umlauts (since we show the values to the user)?
I would use ASCIIFoldingFilterFactory:
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
This way what you index becomes normalized (for example Äxte becomes indexed as Axte), but what is stored doesn't change. That's why you should then get the expected sorting, but the content you'll show will still be the original one (Äxte for example).
UPDATE
The solution doesn't apply to facets since they use the indexed values. Using the ASCIIFoldingFilterFactory you can have the right sort but you'll see normalized character as output as well. Basically you can have the right sort but wrong output or wrong sort but right output. Unfortunately I don't know any other solution.

Resources