All query highlight in apache solr

All query highlight in apache solr - solr

As I am trying to do highlighting in apache solar. I see it highlights even single words in lines. I want to highlight if all query matches. how can i set parameters for that?
I do not want to highlight single words in a query. Highlight only if all query matches in a line.
query - What is synchronization and why is it important
result
1) What is <em>synchronization</em> and why is it <em>important</em>?
2) What is typically the MOST <em>important</em> reason to use risk to drive testing efforts?
iIdon't want to highlight queries with single word match i want to highlight only those with exact match. so how can I configure highlight in solar.
and is there another way i can get field name for matched text.

Simple answer: All you need to know about configuring the Solr highlighter are in the Wiki.
Long answer: You should configure the hl.usePhraseHighlighter Parameter to true. Solr will highlight phrase queries accurately as phrases. Normally, the parts of the phrase will be highlighted everywhere instead of only when it forms the given phrase.

Related

Solr query string not working for full text searches

I'm following this tutorial on how to perform indexing on sample documents using Solr. The default collection is "gettingstarted" as shown. Now I'm trying to query it. There are 52 entries as shown:
However, when I replace the q argument with say electronics, it should return 14 results. However, I get nothing.
When I replace the query string q with cat:electronics, then I actually get the 14 results. But why is this the case? isn't q=word supposed to search for word wherever it appears?

No, it's not. Your assumption that:
isn't q=word supposed to search for word wherever it appears?
is wrong. If you're using word as your only query, and nothing more - you're searching for word in the default search field. It does not search all available fields in all available documents.
Also be aware that the default query parser assumes that your query is in the Lucene Query Syntax. To handle more "natural" querying, you can use the edismax query parser. This query parser supports the qf parameter that tells Solr which fields to search, instead of having to use the cat:electronics syntax. Your example would then be q=electronics&qf=cat.
In the example documents you've given, qf=series_t author name cat is probably a decent value to search all these fields for the given query. You can also append ^<weight> to a field name to give hits in the different fields different weights. qf=name^10 cat would give a hit in name ten times the weight of a hit in the cat field.

Solr OR query on a text field

How to perform a simple query on a text field with an OR condition? Something like name:ABC OR name:XYZ so the resulting set would contain only those docs where name is exactly "XYZ" or "ABC"
Dug tons of manuals, cannot figure this out.
I use Solr 5.5.0
Update: Upgraded to Solr 6.6.0, still cannot figure it out. Below are illustrations to demonstrate my issue:
This works:
This works too:
This still works:
But this does not! Omg why!?

There are many ways to perform OR query. Below I have listed some of them. You can select any of it.
[Simple Query]
q=name:(XYZ OR ABC)
[Lucene Query Parser]
q={!lucene q.op=OR df=name v="XYZ ABC"}

Your syntax is right, but what you're asking for isn't what text fields are made for. A text field is tokenized (split into multiple tokens), and each token is searched by itself. So if the text inserted is "ABC DEF GHI", it will be split into three separate tokens, namely "ABC", "DEF" and "GHI". So when you're searching field:ABC, you're really asking for any document that has the token "ABC" somewhere.
Since you want to perform an exact match, you want to query against a field that is defined as a string field, as this will keep the value verbatim (including casing, so the matching will be case sensitive). You can tell Solr to index the same content into multiple fields by adding a copyFile instruction, telling it to take the content submitted for field foo and also copying it into field bar, allowing you to perform both an exact match if needed and a more general search if necessary.
If you need to perform exact, but case insensitive, searches, you can use a KeywordTokenizer - the KeywordTokenizer does nothing, keeping the whole string as a single token, before allowing you to add filters to the analysis chain. By adding a LowercaseFilter you tell Solr to lowercase the string as well before storing it (or querying for it).
You can use the "Analysis" page under the Solr admin page to experiment and see how content for your field is being processed for each step.
After that querying as string_field:ABC OR string_field:XYZ should do what you want (or string_field:(ABC OR XYZ) or a few other ways to express the same.

A wacky workaround I've just come up with:

Manipulating and Removing Facets in Apache Solr

I am creating a front end application which queries through a database using the Apache Solr engine, but I have two issues that I just cannot find the answer to.
When Solr is processing a Facet query, how do I get the facet to be a single phrase ("Department of the Navy (160)") instead of a broken up facet of 4 terms ("Department (160)" "of (200)" "the (200)" "Navy(160)").
Also, how do I remove certain facets from being queried, for example "and" "to" "the" etc.
Thank you.

Looks like your phrase is being indexed into a Text field which, among many things, splits by whitespace. This is very good for full text search but not for faceting.
You can have a duplicate field for this, of type string (and not Text), which is not splitted. You can still use the original field for searching but the new string field for faceting.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.

Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.

I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})

In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.

With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.

With solr 6, use text_general.

How do I return only a truncated portion of a field in SOLR?

I have a really large (5000+ characters) text field in SOLR named Description. So far it works great for searching and highlighting. If I perform a search and there are no highlighted portions then I just show the first 300 characters. What I would like to do is just return the 300 characters in the result from SOLR.
I would like to do this because when testing I get improved performance if I return a smaller result. This is probably because the XML doc is smaller so less time on the wire and then the processing is faster because the doc is smaller.
I have thought of using a new field that just stored the first 300 characters. I think this would work, but I was wondering if there was a better or more native solution.

What you're looking for is the highlighting hl.maxAlternateFieldLength (http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength).
You will need to define the field as its own alternate field. If you want to highlight the field Description, the highlight query parameters would be:
hl=true
hl.fl=Description
f.Description.hl.alternateField=Description
hl.maxAlternateFieldLength=300
Finally, to omit the Description field from the query result, you will have to exclude it from the fl query parameter:
fl=score,url,title,date,othermetadata

When using the Unified Highlighter, hl.alternateField is not available as a query parameter. Instead you can use the hl.defaultSummary query parameter (available since Solr 4.5)
hl.defaultSummary
If true, use the leading portion of the text as a snippet if a proper highlighted snippet can’t otherwise be generated. The default is false.